Delivery can fail
Providers time out, tokens expire, and networks drop. A robust service treats a send as best effort and reacts to failure rather than assuming success.
Retry strategy
- Transient errors like timeouts get retried with exponential backoff and jitter.
- Permanent errors like an invalid number are not retried; they mark the channel dead.
- A max attempts cap stops infinite loops.
Fallback channels
When a preferred channel keeps failing, the service can fall back to another: push fails, so send SMS. Fallback respects user preferences and priority so it is not used for low value messages.
Avoiding double delivery
Retries and fallback risk sending twice, so the status store and idempotency keys ensure a successful delivery is recorded before giving up on a channel.
Key idea
Retry transient failures with backoff and fall back to another channel for important messages while guarding against double sends.