← Lessons

quiz vs the machine

Platinum1850

System Design

Percentile Computation

Estimating tail latencies from buckets and why averaging fails.

6 min read · advanced · beat Platinum to climb

Why percentiles, not averages

A mean hides the tail. The ninety ninth percentile, the value below which ninety nine percent of observations fall, captures the slow requests users actually feel. Metrics systems estimate percentiles from histograms or sketches.

Estimating from a histogram

Given cumulative bucket counts:

  • Find the rank you want, such as the count times zero point nine nine.
  • Locate the bucket whose cumulative count crosses that rank.
  • Interpolate linearly within that bucket between its lower and upper boundary.

Accuracy is bounded by bucket width, so a wide bucket at the tail gives a fuzzy result.

Why you cannot average percentiles

Percentiles are not linear. Averaging the ninety ninth percentile of two instances does not give the combined ninety ninth percentile. You must aggregate the underlying buckets first, then compute the percentile once.

Sketches as an alternative

Data structures like t digest or DDSketch store compact summaries that merge correctly and give accurate quantiles without fixed buckets.

Key idea

Percentiles come from merging buckets or sketches first then interpolating, never from averaging per instance percentiles, which is mathematically wrong.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can you not average per instance percentiles?

2. What bounds the accuracy of a histogram derived percentile?

3. What advantage do sketches like t digest offer?