The poison message problem
Some messages can never be processed: malformed payloads, references to deleted data, or bugs in the handler. If the consumer keeps retrying such a poison message, it blocks the partition and starves every message behind it.
The dead letter queue
A dead letter queue or DLQ is a separate destination for messages that have failed too many times. After a configured retry limit, the consumer moves the message to the DLQ and advances, freeing the main stream.
What to attach
- Failure reason: the exception or error code.
- Retry count: how many attempts were made.
- Original metadata: topic, partition, and offset for replay.
Operating the DLQ
Teams monitor DLQ depth as a health signal, inspect failures, fix the root cause, and optionally replay corrected messages back to the main topic.
Flow
Key idea
A dead letter queue isolates messages that repeatedly fail so one poison message cannot stall the entire stream.