A Place for Poison
Some jobs fail every attempt: a malformed payload, a bug in the handler, or a permanently missing record. If they retry forever they burn capacity and can block the queue. A dead letter queue, or DLQ, is where such jobs go after exhausting retries.
What the DLQ Buys You
- Isolation because failing jobs leave the main queue, so healthy jobs keep flowing.
- Visibility because the DLQ depth is a clear alarm that something is broken.
- Recovery because you can inspect, fix, and replay jobs once the cause is resolved.
Keep Context
A job in the DLQ should carry why it died: the last error, the attempt count, and a timestamp. Without this you cannot diagnose the failure later.
Operate It
- Alert when DLQ depth grows, since a healthy pipeline keeps it near zero.
- Replay carefully after a fix, ideally in small batches.
- Expire ancient entries so the DLQ does not grow forever.
A non empty DLQ is a signal, not a resting place. Triage it.
Key idea
A dead letter queue quarantines jobs that exhaust retries so the main pipeline keeps flowing while failures are inspected and replayed.