← Lessons

quiz vs the machine

Platinum1800

System Design

Delivery Retry And Fallback Channel

Retrying transient failures and switching channels when one provider keeps failing.

6 min read · advanced · beat Platinum to climb

Delivery can fail

Providers time out, tokens expire, and networks drop. A robust service treats a send as best effort and reacts to failure rather than assuming success.

Retry strategy

  • Transient errors like timeouts get retried with exponential backoff and jitter.
  • Permanent errors like an invalid number are not retried; they mark the channel dead.
  • A max attempts cap stops infinite loops.

Fallback channels

When a preferred channel keeps failing, the service can fall back to another: push fails, so send SMS. Fallback respects user preferences and priority so it is not used for low value messages.

Avoiding double delivery

Retries and fallback risk sending twice, so the status store and idempotency keys ensure a successful delivery is recorded before giving up on a channel.

Key idea

Retry transient failures with backoff and fall back to another channel for important messages while guarding against double sends.

Check yourself

Answer to earn rating on the learn ladder.

1. Which errors should be retried?

2. What is a fallback channel?

3. How is double delivery avoided during retries?