Averages hide pain
If you report only the average latency, you can hide a serious problem. A service that responds in five milliseconds on average might still leave one in a hundred users waiting two seconds. Those slow requests are the tail.
Percentiles tell the truth
A percentile describes a threshold. The p99 latency is the value below which ninety nine percent of requests complete. It captures how bad the slow experiences are.
- p50, the median, describes the typical request.
- p99 and p999 describe the painful tail that users remember.
Why tails compound
A page often makes many backend calls in parallel and waits for all of them. If each call has a one percent chance of being slow, a page with a hundred calls will almost always hit at least one slow call. The tail of one service becomes the median experience of the whole page.
Key idea
Tail latency measured at p99 matters more than the average because slow requests fan in and define the real user experience.