The tail latency problem
Even a fast service has slow outliers. One unlucky request might hit a busy node, a garbage collection pause, or a cold cache. These outliers dominate the high percentiles that users feel.
What hedging does
Request hedging sends a second copy of a request to another replica after a short delay, then uses whichever response returns first and cancels the other.
Why the delay matters
- Fire too early and you double load for little benefit.
- Fire near a high percentile of the latency distribution, for example after the ninety fifth percentile, so most requests never hedge and only the slow tail does.
Costs and guards
- Extra load: hedging adds work, so it must stay a small fraction of traffic.
- Idempotency: hedged requests can both execute, so the operation must be safe to repeat.
Key idea
Request hedging trims tail latency by racing a delayed backup against a slow original, but the hedge delay must be tuned high and the work must be idempotent so extra load stays small.