Why Limit
Rate limiting protects an API from abuse, runaway clients, and accidental overload. It caps how many requests a caller may make in a period and rejects or delays the rest, usually with a 429 Too Many Requests status.
Common Algorithms
- Fixed window counts requests per clock interval; simple but bursts at edges.
- Sliding window smooths those edges by weighting recent counts.
- Token bucket refills tokens at a steady rate and allows short bursts.
- Leaky bucket drains requests at a constant rate to smooth output.
Where to Enforce
Limits usually live at the API gateway so every backend is protected uniformly. Track counters per API key or per IP in a fast shared store such as Redis so all gateway nodes agree. Tell clients their remaining quota with response headers.
Key idea
Rate limiters such as token bucket cap request rates at the gateway and reply with 429 when exceeded, protecting backends while allowing controlled bursts.