Skip to main content

Retries & delivery log

The gateway retries failed deliveries on a fixed exponential-ish schedule. After the seventh attempt the delivery is marked permanently failed and must be requeued manually if you still want it.

What counts as success / failure

Outcome from your endpointTreated as
HTTP 2xx within 10 secondsSuccess
HTTP non-2xx (any 3xx / 4xx / 5xx)Failure
Request times out at 10 secondsFailure
TLS handshake error, DNS failure, connection refusedFailure

The transport-level timeout is hard-coded at 10 seconds. Plan your receiver to acknowledge fast — persist the event to your own queue and process asynchronously, then return 200.

Backoff schedule

Every failure schedules the next attempt at a fixed delay from the previous one:

AttemptDelay since previousCumulative time after first attempt
1— (initial delivery)0
2+ 1 minute1 m
3+ 5 minutes6 m
4+ 30 minutes36 m
5+ 2 hours2 h 36 m
6+ 12 hours14 h 36 m
7+ 24 hours38 h 36 m

After attempt 7 fails, the delivery moves to status='failed' (dead). The gateway does not auto-revive it — you'd need to either ship a fix to your receiver and manually requeue, or accept the loss.

Why this schedule

The pattern is "fast then slow" on purpose:

  • The first three retries (within 6 minutes) catch transient network blips and deploys.
  • The mid-range delays (30 m, 2 h) handle longer outages or partial deploys.
  • The long tail (12 h, 24 h) gives someone on-call enough time to actually notice and fix a receiver before the delivery is lost forever.

Dedupe by envelope id

Every retry of the same delivery uses the same envelope id (the delivery record's id) and the same X-CPG-Delivery header value. The body bytes are byte-for-byte identical across retries — only the X-CPG-Timestamp and signature change (since the signature includes the timestamp).

Your receiver should treat id as a primary key: process each id once, return 2xx immediately on duplicates. Otherwise you risk double-crediting users when a retry arrives after you've already processed the original.

Delivery log UI

Your dashboard's Webhook deliveries page shows every attempt in real time:

  • Pending / Delivering / Delivered / Failed (dead) — current status. A row waiting for a retry shows as Pending (with a non-zero attempt count and a future retry time); there is no distinct "retrying" status.
  • Event type and timestamp — what fired and when.
  • Most recent response — the HTTP status code of the last attempt (e.g. 500), or blank when the request never received a response (DNS / TLS / connection / timeout). The response body is not stored — only the status code is retained.
  • Attempt count0 / 7 for a fresh delivery awaiting its first attempt (the counter increments after each attempt); 7 / 7 for a permanently-failed one.
  • Retry button — visible on failed and dead rows. Clicking it resets the row to pending with attempt_count=0, clears the previous response, and sets an immediate next_retry_at. The next worker tick (~10 s) picks it up.

A "manual" retry uses the same body bytes as the original — so if your receiver has already processed that id, it will dedupe correctly without any extra work on your end.

When to manually requeue

The retry button is most useful in these scenarios:

  • You shipped a fix to your receiver after the delivery had already exhausted automatic retries. Click retry and the original event lands again.
  • You wiped your local event store and want to re-process recent events. Delivery rows are not automatically pruned, so you can requeue an old delivery as long as its webhook still exists.
  • You're debugging during development and want to repeatedly receive the same event for testing.

When a delivery is not delivered

If a transient infrastructure issue prevents an event from being enqueued at the moment it fires, that event may be dropped — the gateway favours availability over delivery guarantees. The transaction itself is always recorded, so your /v1/transactions poll will still surface the deposit/withdrawal. For mission-critical accounting, treat webhook events as advisories and reconcile against /v1/transactions (or the per-resource GETs) on a slow loop.