Skip to content

Incident Runbook

This runbook covers common production incidents for HFT71 Integration:

  • API outage
  • webhook failures
  • stock drift

Use this document during live incidents to speed up detection, mitigation, and recovery.

1) Incident severity and response targets

  • SEV-1: Order flow blocked (new processing orders cannot be sent to HFT71).
  • SEV-2: Partial degradation (webhook failing, polling still works; or stock sync degraded).
  • SEV-3: Minor inconsistency (small stock drift, intermittent retries, no customer impact yet).

Suggested response targets:

  • SEV-1: acknowledge within 5 minutes, mitigation within 30 minutes.
  • SEV-2: acknowledge within 15 minutes, mitigation within 2 hours.
  • SEV-3: acknowledge within 1 business day.

2) Quick diagnostics

  1. Open WooCommerce > HFT71 Logi.
  2. Check the most recent actions:
  3. send_order
  4. get_status
  5. webhook
  6. stock_sync
  7. update_order, update_order_address
  8. Note recurring HTTP statuses:
  9. 0 or timeout-like responses -> transport/connectivity issue
  10. 401/403 -> auth/permissions/expired credentials
  11. 404 -> endpoint/base URL mismatch
  12. 429 -> rate limiting
  13. 5xx -> remote API outage
  14. Verify plugin settings:
  15. API base URL
  16. username/password
  17. webhook secret
  18. polling/stock intervals
  19. Verify WP-Cron is running in production.

3) Scenario A - API outage

Symptoms

  • send_order and/or get_status failures with timeouts or 5xx.
  • New processing orders are not getting _hft71_order_id.

Immediate mitigation

  • Confirm incident with HFT71 provider/status channel.
  • Keep WooCommerce checkout active unless business decides otherwise.
  • Inform fulfillment/ops that external handoff is delayed.
  • Avoid repeated manual resubmissions that can create duplicates.

Recovery steps

  1. Confirm API availability restored (successful auth + test call).
  2. Identify affected WooCommerce orders:
  3. processing orders without _hft71_order_id
  4. or failed send_order logs in outage time window
  5. Re-send impacted orders in controlled batches.
  6. Confirm each order receives _hft71_order_id.
  7. Monitor logs for 30-60 minutes for renewed failures.

Escalation data to collect

  • outage start/end time (UTC)
  • affected order count
  • representative request IDs/log entries
  • top failing endpoints/status codes

4) Scenario B - Webhook failures

Symptoms

  • HFT71 reports webhook delivery failures.
  • Status not updating in WooCommerce via webhook path.
  • webhook log entries missing or rejected.

Immediate checks

  • Verify endpoint URL:
  • /wp-json/hft71/v1/order-status
  • If secret enabled, verify exact match:
  • header X-HFT71-Webhook-Secret or body webhook_secret
  • Confirm HTTPS certificate validity and no WAF/CDN block.
  • Confirm requests reach origin (web server access/error logs).

Mitigation

  • Keep polling enabled (hft71_poll_order_status) as fallback.
  • If webhook auth mismatch suspected, rotate secret and update both sides.

Recovery validation

  1. Send a controlled webhook test payload.
  2. Verify response is HTTP 200 with "success": true when mapping/order exists.
  3. Verify corresponding order status and _hft71_last_status update.
  4. Verify log entry with action webhook.

5) Scenario C - Stock drift

Stock drift means WooCommerce stock differs from HFT71 source values.

Symptoms

  • Product quantities in WooCommerce do not match GET /stock/available.
  • Frequent oversell/out-of-stock mismatch reports.

Immediate checks

  • Confirm hft71_stock_sync_enabled is true.
  • Confirm SKU integrity:
  • WooCommerce SKU exists
  • exact match to HFT71 SKU
  • Confirm stock sync interval and cron execution.
  • Run manual stock sync from settings.

Mitigation

  • Run controlled manual sync.
  • Prioritize high-volume SKUs first.
  • If drift persists, temporarily increase sync frequency (safe interval).

Recovery validation

  1. Select sample SKUs (high-volume + random).
  2. Compare HFT71 stock vs WooCommerce stock quantity.
  3. Confirm _hft71_stock_available meta updates.
  4. Monitor for one full sync cycle to ensure stability.

6) Rollback decision guide

Consider rollback when:

  • plugin change introduced systematic failures after release
  • order handoff blocked and hotfix is not immediately available
  • data corruption risk is detected

Rollback actions:

  1. Revert plugin to last known-good build.
  2. Re-check plugin activation and settings integrity.
  3. Re-run smoke tests (order send, webhook, stock sync).
  4. Keep incident timeline for postmortem.

7) Post-incident checklist

  • [ ] Incident ticket updated with root cause.
  • [ ] Customer/business communication sent (if required).
  • [ ] Backlog tasks created (preventive fixes, alerts, tests).
  • [ ] CHANGELOG.md updated if behavior changed.
  • [ ] Runbook updated with lessons learned.
  • Burst of failed send_order logs in 5-minute window.
  • 401 rate above threshold (credential drift).
  • no successful webhook for expected traffic period.
  • no successful stock_sync in expected interval window.