The poison message problem
In a queue based system, a message that always fails to process is a poison message. If the consumer keeps retrying it, the message blocks the queue and burns resources forever. The fix is a dead letter queue, a separate queue where messages go after they exceed a retry limit.
How it works
The consumer tries to process a message a fixed number of times. Each failure increases a delivery count. Once the count crosses a threshold, the broker moves the message to the dead letter queue instead of redelivering it. The main flow keeps moving while the bad message waits aside.
Operating a dead letter queue
- Alert when the dead letter queue is non empty, because it means something is broken.
- Inspect failed messages to find the cause, such as a bad schema or a missing record.
- Redrive fixed messages back to the main queue once the bug is resolved.
A dead letter queue that no one watches is just a silent data loss bin.
Key idea
A dead letter queue isolates poison messages so the main flow keeps moving, but only helps if someone watches and redrives it.