Why shed load
When demand exceeds capacity a system can either degrade gracefully or collapse. Without protection, queues grow, latency explodes, timeouts cascade, and total throughput falls toward zero. Load shedding deliberately rejects some work so the rest succeeds.
How shedding decides
- Detect overload from a signal like queue depth, latency, or concurrency in flight.
- Reject early and cheap, returning a fast error before expensive work begins.
- Prioritize so critical traffic survives while low value or retryable work is dropped first.
Shedding versus rate limiting
- Rate limiting caps each client by a fixed quota regardless of system state.
- Load shedding reacts to live overload and drops whatever protects the whole system right now.
Doing it well
- Cheap rejection: a shed request must cost far less than a served one.
- Avoid retry storms: tell clients to back off so rejected work does not immediately return.
Key idea
Load shedding keeps an overloaded system alive by cheaply rejecting lower value work, trading some requests for the survival and predictable latency of the rest.