Incident Runbook
This runbook covers common production incidents for HFT71 Integration:
- API outage
- webhook failures
- stock drift
Use this document during live incidents to speed up detection, mitigation, and recovery.
1) Incident severity and response targets
- SEV-1: Order flow blocked (new processing orders cannot be sent to HFT71).
- SEV-2: Partial degradation (webhook failing, polling still works; or stock sync degraded).
- SEV-3: Minor inconsistency (small stock drift, intermittent retries, no customer impact yet).
Suggested response targets:
- SEV-1: acknowledge within 5 minutes, mitigation within 30 minutes.
- SEV-2: acknowledge within 15 minutes, mitigation within 2 hours.
- SEV-3: acknowledge within 1 business day.
2) Quick diagnostics
- Open
WooCommerce > HFT71 Logi. - Check the most recent actions:
send_orderget_statuswebhookstock_syncupdate_order,update_order_address- Note recurring HTTP statuses:
0or timeout-like responses -> transport/connectivity issue401/403-> auth/permissions/expired credentials404-> endpoint/base URL mismatch429-> rate limiting5xx-> remote API outage- Verify plugin settings:
- API base URL
- username/password
- webhook secret
- polling/stock intervals
- Verify WP-Cron is running in production.
3) Scenario A - API outage
Symptoms
send_orderand/orget_statusfailures with timeouts or5xx.- New processing orders are not getting
_hft71_order_id.
Immediate mitigation
- Confirm incident with HFT71 provider/status channel.
- Keep WooCommerce checkout active unless business decides otherwise.
- Inform fulfillment/ops that external handoff is delayed.
- Avoid repeated manual resubmissions that can create duplicates.
Recovery steps
- Confirm API availability restored (successful auth + test call).
- Identify affected WooCommerce orders:
processingorders without_hft71_order_id- or failed
send_orderlogs in outage time window - Re-send impacted orders in controlled batches.
- Confirm each order receives
_hft71_order_id. - Monitor logs for 30-60 minutes for renewed failures.
Escalation data to collect
- outage start/end time (UTC)
- affected order count
- representative request IDs/log entries
- top failing endpoints/status codes
4) Scenario B - Webhook failures
Symptoms
- HFT71 reports webhook delivery failures.
- Status not updating in WooCommerce via webhook path.
webhooklog entries missing or rejected.
Immediate checks
- Verify endpoint URL:
/wp-json/hft71/v1/order-status- If secret enabled, verify exact match:
- header
X-HFT71-Webhook-Secretor bodywebhook_secret - Confirm HTTPS certificate validity and no WAF/CDN block.
- Confirm requests reach origin (web server access/error logs).
Mitigation
- Keep polling enabled (
hft71_poll_order_status) as fallback. - If webhook auth mismatch suspected, rotate secret and update both sides.
Recovery validation
- Send a controlled webhook test payload.
- Verify response is HTTP 200 with
"success": truewhen mapping/order exists. - Verify corresponding order status and
_hft71_last_statusupdate. - Verify log entry with action
webhook.
5) Scenario C - Stock drift
Stock drift means WooCommerce stock differs from HFT71 source values.
Symptoms
- Product quantities in WooCommerce do not match
GET /stock/available. - Frequent oversell/out-of-stock mismatch reports.
Immediate checks
- Confirm
hft71_stock_sync_enabledis true. - Confirm SKU integrity:
- WooCommerce SKU exists
- exact match to HFT71 SKU
- Confirm stock sync interval and cron execution.
- Run manual stock sync from settings.
Mitigation
- Run controlled manual sync.
- Prioritize high-volume SKUs first.
- If drift persists, temporarily increase sync frequency (safe interval).
Recovery validation
- Select sample SKUs (high-volume + random).
- Compare HFT71 stock vs WooCommerce stock quantity.
- Confirm
_hft71_stock_availablemeta updates. - Monitor for one full sync cycle to ensure stability.
6) Rollback decision guide
Consider rollback when:
- plugin change introduced systematic failures after release
- order handoff blocked and hotfix is not immediately available
- data corruption risk is detected
Rollback actions:
- Revert plugin to last known-good build.
- Re-check plugin activation and settings integrity.
- Re-run smoke tests (order send, webhook, stock sync).
- Keep incident timeline for postmortem.
7) Post-incident checklist
- [ ] Incident ticket updated with root cause.
- [ ] Customer/business communication sent (if required).
- [ ] Backlog tasks created (preventive fixes, alerts, tests).
- [ ]
CHANGELOG.mdupdated if behavior changed. - [ ] Runbook updated with lessons learned.
8) Suggested alerts (recommended)
- Burst of failed
send_orderlogs in 5-minute window. 401rate above threshold (credential drift).- no successful
webhookfor expected traffic period. - no successful
stock_syncin expected interval window.