← Lessons

quiz vs the machine

Platinum1800

System Design

Load Shedding

When overloaded, drop low value work fast so the system serves the rest instead of collapsing.

5 min read · advanced · beat Platinum to climb

Why shed load

When demand exceeds capacity a system can either degrade gracefully or collapse. Without protection, queues grow, latency explodes, timeouts cascade, and total throughput falls toward zero. Load shedding deliberately rejects some work so the rest succeeds.

How shedding decides

  • Detect overload from a signal like queue depth, latency, or concurrency in flight.
  • Reject early and cheap, returning a fast error before expensive work begins.
  • Prioritize so critical traffic survives while low value or retryable work is dropped first.

Shedding versus rate limiting

  • Rate limiting caps each client by a fixed quota regardless of system state.
  • Load shedding reacts to live overload and drops whatever protects the whole system right now.

Doing it well

  • Cheap rejection: a shed request must cost far less than a served one.
  • Avoid retry storms: tell clients to back off so rejected work does not immediately return.

Key idea

Load shedding keeps an overloaded system alive by cheaply rejecting lower value work, trading some requests for the survival and predictable latency of the rest.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the goal of load shedding?

2. How does load shedding differ from rate limiting?