Load Shedding

When overloaded, drop low value work fast so the system serves the rest instead of collapsing.

Why shed load

When demand exceeds capacity a system can either degrade gracefully or collapse. Without protection, queues grow, latency explodes, timeouts cascade, and total throughput falls toward zero. Load shedding deliberately rejects some work so the rest succeeds.

How shedding decides

Detect overload from a signal like queue depth, latency, or concurrency in flight.
Reject early and cheap, returning a fast error before expensive work begins.
Prioritize so critical traffic survives while low value or retryable work is dropped first.

Shedding versus rate limiting

Rate limiting caps each client by a fixed quota regardless of system state.
Load shedding reacts to live overload and drops whatever protects the whole system right now.

Doing it well

Cheap rejection: a shed request must cost far less than a served one.
Avoid retry storms: tell clients to back off so rejected work does not immediately return.

Key idea

Load shedding keeps an overloaded system alive by cheaply rejecting lower value work, trading some requests for the survival and predictable latency of the rest.

Why shed load

How shedding decides

Shedding versus rate limiting

Doing it well

Key idea

Check yourself