Request Hedging

Cut tail latency by sending a backup request when the first one runs slow, then take the winner.

The tail latency problem

Even a fast service has slow outliers. One unlucky request might hit a busy node, a garbage collection pause, or a cold cache. These outliers dominate the high percentiles that users feel.

What hedging does

Request hedging sends a second copy of a request to another replica after a short delay, then uses whichever response returns first and cancels the other.

Why the delay matters

Fire too early and you double load for little benefit.
Fire near a high percentile of the latency distribution, for example after the ninety fifth percentile, so most requests never hedge and only the slow tail does.

Costs and guards

Extra load: hedging adds work, so it must stay a small fraction of traffic.
Idempotency: hedged requests can both execute, so the operation must be safe to repeat.

Key idea

Request hedging trims tail latency by racing a delayed backup against a slow original, but the hedge delay must be tuned high and the work must be idempotent so extra load stays small.

The tail latency problem

What hedging does

Why the delay matters

Costs and guards

Key idea

Check yourself