← Lessons

quiz vs the machine

Gold1350

System Design

Poison Message Handling

Stopping one unprocessable message from being retried forever and stalling the whole queue.

4 min read · core · beat Gold to climb

The poison message problem

A poison message is one that always fails no matter how often it is retried, perhaps because it is malformed or triggers a bug. Under at least once delivery, the broker redelivers it after each failure, so it can loop forever. Worse, if it sits at the head of an ordered queue, it can block every message behind it.

How to defuse it

  • Track a delivery count or retry count per message.
  • After a threshold, stop retrying and move the message to a dead letter queue for later inspection.
  • Add backoff between retries so a transient failure gets a chance to recover without hammering the system.

Distinguishing causes

Not every failure means poison. A database being briefly down is transient and worth retrying. A schema violation is permanent and should dead letter quickly. Good handlers separate the two, retrying transient errors a few times but routing permanent ones straight to the dead letter queue.

Key idea

Cap retries with a delivery count and dead letter the message so a single poison message cannot loop forever or block the queue.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is a poison message dangerous in an ordered queue?

2. What stops a poison message from being retried forever?