← Lessons

quiz vs the machine

Gold1420

System Design

Percentile Latency and Tail Tolerance

Why averages lie and the slowest requests decide how your service feels.

5 min read · core · beat Gold to climb

Averages hide the pain

If most requests are fast but a few are very slow, the average looks fine while real users suffer. This is why reliability work measures percentiles. The ninety fifth percentile is the value that ninety five percent of requests come in under, and the ninety ninth percentile captures the slow tail.

The tail dominates experience

A single page often makes many backend calls. If each call has a one percent chance of being slow, a page with one hundred calls is almost certain to hit at least one slow call. So the tail latency of a dependency becomes the typical latency of the whole page. This is tail amplification.

Taming the tail

  • Hedged requests send a second copy of a slow request and take whichever returns first.
  • Timeouts and retries cut off the worst cases instead of waiting forever.
  • Load shedding drops excess work so the rest stays fast.

Key idea

Users feel the tail, so measure high percentiles and design so a few slow calls cannot dominate the whole request.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are percentiles preferred over averages for latency?

2. What is tail amplification?

3. How does a hedged request help the tail?