← Lessons

quiz vs the machine

Platinum1800

System Design

Rate Limiting at the Edge

Capping request rates near the user to protect services globally.

6 min read · advanced · beat Platinum to climb

Limiting Close to the Source

Rate limiting caps how many requests a client may send in a window. Doing it at the edge rejects excess traffic near the user, so abusive load never reaches origin.

Common Algorithms

  • Fixed window counts requests per time bucket, simple but bursty at edges
  • Sliding window smooths the boundary problem of fixed windows
  • Token bucket refills tokens at a steady rate and allows short bursts
  • Leaky bucket drains at a constant rate, enforcing a smooth output

The Distributed Challenge

Edges are spread worldwide, so a global limit requires shared state. Options trade accuracy against latency.

  • Local counters are fast but let a client exceed the global cap across nodes
  • Synced counters share state via a fast store, more accurate but slower
  • Approximate counting accepts small overcounts for speed

Responding

Return a clear 429 status with a retry hint so well behaved clients back off rather than hammer harder.

Key idea

Edge rate limiting rejects excess requests near the user using algorithms like token bucket, but enforcing a global cap across many edges forces a trade off between shared state accuracy and latency.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does global rate limiting across edges require shared state?

2. Which algorithm allows short bursts while enforcing an average rate?

3. Why return a 429 with a retry hint?