The problem of stragglers
In a replicated service, most replicas answer fast but one may stall from a slow disk, a garbage collection pause, or a noisy neighbor. Waiting on that one straggler inflates the tail.
The hedge
A hedged request sends the same query to one replica, then if no answer arrives within a short delay, sends a second copy to another replica and takes whichever returns first.
- First try goes to one replica as usual.
- After a delay near the p95 latency, a backup fires.
- First winner is used and the loser is cancelled.
Because the backup only fires when the first is already slow, extra load stays small, often a few percent.
Tuning the delay
- Too short a delay duplicates many requests and wastes capacity.
- Too long and the hedge fires after the user already waited.
- Set it near p95 so only genuine stragglers are hedged.
Caution
Hedging works for idempotent reads. For writes you need deduplication so the backup does not apply a change twice.
Key idea
Hedged requests trim the tail by racing a delayed backup against a straggler, adding little load when the delay is set near a high percentile.