Saying no on purpose
When demand exceeds capacity, a service cannot serve everyone. Load shedding is the deliberate choice to reject some requests quickly so the rest succeed, instead of accepting all and serving them all slowly.
Why slow failure is worse
If a saturated service accepts every request, queues grow, latency climbs, and clients time out anyway. The work was done but wasted, and retries make it worse. Shedding early means the rejected requests cost almost nothing and the admitted ones stay fast.
Choosing what to drop
- Reject based on a queue depth or latency threshold so the trigger reflects real saturation.
- Prefer to shed low priority traffic first, keeping critical requests flowing.
- Return a fast clear signal so clients can back off rather than hammer.
The goodput goal
The aim is to maximize goodput, the rate of useful completed work, not raw throughput. A service that admits less but finishes what it admits delivers more useful work than one that chokes on everything.
Key idea
Under overload, shed low value work early to protect goodput, because admitting everything just turns into slow failure for all.