← Lessons

quiz vs the machine

Platinum1780

System Design

Request Hedging

Cut tail latency by sending a backup request when the first one runs slow, then take the winner.

5 min read · advanced · beat Platinum to climb

The tail latency problem

Even a fast service has slow outliers. One unlucky request might hit a busy node, a garbage collection pause, or a cold cache. These outliers dominate the high percentiles that users feel.

What hedging does

Request hedging sends a second copy of a request to another replica after a short delay, then uses whichever response returns first and cancels the other.

Why the delay matters

  • Fire too early and you double load for little benefit.
  • Fire near a high percentile of the latency distribution, for example after the ninety fifth percentile, so most requests never hedge and only the slow tail does.

Costs and guards

  • Extra load: hedging adds work, so it must stay a small fraction of traffic.
  • Idempotency: hedged requests can both execute, so the operation must be safe to repeat.

Key idea

Request hedging trims tail latency by racing a delayed backup against a slow original, but the hedge delay must be tuned high and the work must be idempotent so extra load stays small.

Check yourself

Answer to earn rating on the learn ladder.

1. What does request hedging primarily improve?

2. Why set the hedge delay near a high latency percentile?

3. What property must a hedged operation have?