Limiting Close to the Source
Rate limiting caps how many requests a client may send in a window. Doing it at the edge rejects excess traffic near the user, so abusive load never reaches origin.
Common Algorithms
- Fixed window counts requests per time bucket, simple but bursty at edges
- Sliding window smooths the boundary problem of fixed windows
- Token bucket refills tokens at a steady rate and allows short bursts
- Leaky bucket drains at a constant rate, enforcing a smooth output
The Distributed Challenge
Edges are spread worldwide, so a global limit requires shared state. Options trade accuracy against latency.
- Local counters are fast but let a client exceed the global cap across nodes
- Synced counters share state via a fast store, more accurate but slower
- Approximate counting accepts small overcounts for speed
Responding
Return a clear 429 status with a retry hint so well behaved clients back off rather than hammer harder.
Key idea
Edge rate limiting rejects excess requests near the user using algorithms like token bucket, but enforcing a global cap across many edges forces a trade off between shared state accuracy and latency.