← Lessons

quiz vs the machine

Gold1420

System Design

Tail Latency Amplification

Why fanning out to many services makes a rare slow call almost certain.

5 min read · core · beat Gold to climb

One slow call decides the page

A request often fans out to many backends and waits for all of them. The user sees the slowest reply, so the tail of the slowest component becomes the latency of the whole page.

The math of fan out

If each call has a 1 percent chance of being slow, a single call is rarely slow. But a page that waits on 100 such calls is slow most of the time.

  • One call has a 99 percent chance of being fast.
  • One hundred calls have only about 0.99 to the power 100, around 37 percent, chance that all are fast.
  • So roughly two in three pages hit at least one slow call.

This is tail latency amplification, where parallel fan out turns a rare event into a common one.

Defenses

  • Cut fan out by batching or caching so fewer calls are needed.
  • Hedge requests by sending a backup to a second replica.
  • Set tight timeouts with degraded fallbacks for stragglers.
  • Trim the tail at each backend, since shaving p99 there pays off everywhere.

Key idea

Parallel fan out amplifies the tail, so a page is only as fast as its slowest dependency and you must attack p99 at every hop.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does fanning out to many services amplify tail latency?

2. Which technique directly reduces tail latency amplification?