The shared budget problem
Each server can rate limit its own traffic easily. But a global limit, like one thousand requests per second per customer, spans many servers that must share a single budget without exceeding it.
Approaches
- Centralized counter: every server checks a shared store like Redis. Accurate but adds latency and a hot key.
- Token bucket per node: split the global budget across nodes. Fast but wastes budget when traffic is uneven.
- Sliding window with sharing: nodes periodically report usage and adjust their local allowance.
The trade
The core tension is accuracy versus latency. A central check is precise but slow and a bottleneck; local buckets are fast but can drift above or below the true global limit. Many systems pick local enforcement with periodic reconciliation for a practical balance.
Key idea
A distributed rate limiter shares one budget across servers, trading central accuracy against local speed, often using local buckets reconciled periodically.