Percentile Latency and Tail Tolerance

Averages hide the pain

If most requests are fast but a few are very slow, the average looks fine while real users suffer. This is why reliability work measures percentiles. The ninety fifth percentile is the value that ninety five percent of requests come in under, and the ninety ninth percentile captures the slow tail.

The tail dominates experience

A single page often makes many backend calls. If each call has a one percent chance of being slow, a page with one hundred calls is almost certain to hit at least one slow call. So the tail latency of a dependency becomes the typical latency of the whole page. This is tail amplification.

Taming the tail

Hedged requests send a second copy of a slow request and take whichever returns first.
Timeouts and retries cut off the worst cases instead of waiting forever.
Load shedding drops excess work so the rest stays fast.

Key idea

Users feel the tail, so measure high percentiles and design so a few slow calls cannot dominate the whole request.

Percentile Latency and Tail Tolerance

Averages hide the pain

The tail dominates experience

Taming the tail

Key idea

Check yourself