The Load Shedding Under Pressure

Why a saturated service should drop work early instead of slowly failing every request.

Saying no on purpose

When demand exceeds capacity, a service cannot serve everyone. Load shedding is the deliberate choice to reject some requests quickly so the rest succeed, instead of accepting all and serving them all slowly.

Why slow failure is worse

If a saturated service accepts every request, queues grow, latency climbs, and clients time out anyway. The work was done but wasted, and retries make it worse. Shedding early means the rejected requests cost almost nothing and the admitted ones stay fast.

Choosing what to drop

Reject based on a queue depth or latency threshold so the trigger reflects real saturation.
Prefer to shed low priority traffic first, keeping critical requests flowing.
Return a fast clear signal so clients can back off rather than hammer.

The goodput goal

The aim is to maximize goodput, the rate of useful completed work, not raw throughput. A service that admits less but finishes what it admits delivers more useful work than one that chokes on everything.

Key idea

Under overload, shed low value work early to protect goodput, because admitting everything just turns into slow failure for all.

The Load Shedding Under Pressure

Saying no on purpose

Why slow failure is worse

Choosing what to drop

The goodput goal

Key idea

Check yourself