← Lessons

quiz vs the machine

Gold1420

Concurrency

The Load Shedding Under Pressure

Why a saturated service should drop work early instead of slowly failing every request.

5 min read · core · beat Gold to climb

Saying no on purpose

When demand exceeds capacity, a service cannot serve everyone. Load shedding is the deliberate choice to reject some requests quickly so the rest succeed, instead of accepting all and serving them all slowly.

Why slow failure is worse

If a saturated service accepts every request, queues grow, latency climbs, and clients time out anyway. The work was done but wasted, and retries make it worse. Shedding early means the rejected requests cost almost nothing and the admitted ones stay fast.

Choosing what to drop

  • Reject based on a queue depth or latency threshold so the trigger reflects real saturation.
  • Prefer to shed low priority traffic first, keeping critical requests flowing.
  • Return a fast clear signal so clients can back off rather than hammer.

The goodput goal

The aim is to maximize goodput, the rate of useful completed work, not raw throughput. A service that admits less but finishes what it admits delivers more useful work than one that chokes on everything.

Key idea

Under overload, shed low value work early to protect goodput, because admitting everything just turns into slow failure for all.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is shedding load better than accepting everything when saturated?

2. What metric does load shedding aim to maximize?