← Lessons

quiz vs the machine

Gold1370

System Design

The Dead Letter Queue for Jobs

Quarantine jobs that exhaust retries so they stop blocking the pipeline.

4 min read · core · beat Gold to climb

A Place for Poison

Some jobs fail every attempt: a malformed payload, a bug in the handler, or a permanently missing record. If they retry forever they burn capacity and can block the queue. A dead letter queue, or DLQ, is where such jobs go after exhausting retries.

What the DLQ Buys You

  • Isolation because failing jobs leave the main queue, so healthy jobs keep flowing.
  • Visibility because the DLQ depth is a clear alarm that something is broken.
  • Recovery because you can inspect, fix, and replay jobs once the cause is resolved.

Keep Context

A job in the DLQ should carry why it died: the last error, the attempt count, and a timestamp. Without this you cannot diagnose the failure later.

Operate It

  • Alert when DLQ depth grows, since a healthy pipeline keeps it near zero.
  • Replay carefully after a fix, ideally in small batches.
  • Expire ancient entries so the DLQ does not grow forever.

A non empty DLQ is a signal, not a resting place. Triage it.

Key idea

A dead letter queue quarantines jobs that exhaust retries so the main pipeline keeps flowing while failures are inspected and replayed.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the primary purpose of a dead letter queue?

2. What should a job in the DLQ carry to aid diagnosis?