Tail Latency Amplification

One slow call decides the page

A request often fans out to many backends and waits for all of them. The user sees the slowest reply, so the tail of the slowest component becomes the latency of the whole page.

The math of fan out

If each call has a 1 percent chance of being slow, a single call is rarely slow. But a page that waits on 100 such calls is slow most of the time.

One call has a 99 percent chance of being fast.
One hundred calls have only about 0.99 to the power 100, around 37 percent, chance that all are fast.
So roughly two in three pages hit at least one slow call.

This is tail latency amplification, where parallel fan out turns a rare event into a common one.

Defenses

Cut fan out by batching or caching so fewer calls are needed.
Hedge requests by sending a backup to a second replica.
Set tight timeouts with degraded fallbacks for stragglers.
Trim the tail at each backend, since shaving p99 there pays off everywhere.

Key idea

Parallel fan out amplifies the tail, so a page is only as fast as its slowest dependency and you must attack p99 at every hop.

Tail Latency Amplification

One slow call decides the page

The math of fan out

Defenses

Key idea

Check yourself