Why averages lie
The mean latency of a service can look healthy while many users suffer. A few very slow requests barely move the average, so teams report percentiles instead.
Reading a percentile
A percentile names the value below which a share of requests fall.
- p50 is the median, half of requests are faster.
- p90 means nine in ten are faster.
- p99 is the slow tail, only one in a hundred is worse.
If p50 is 20 ms but p99 is 800 ms, most calls feel snappy yet one percent stall badly.
Why the tail matters
A single page often fans out to many backend calls. The slowest one decides the page time, so a user is exposed to the tail far more than the median suggests.
Measuring well
Percentiles do not average across machines, so compute them from raw samples or use histograms. Track p50, p99, and p999 together to see both the typical and the worst case.
Key idea
Watch percentiles not averages, because the tail at p99 is what frustrated users actually feel.