Requirements
- Limit a client to n requests per time window across all servers.
- Enforce limits with minimal added latency.
- Behave correctly even when servers see traffic independently.
High level design
A shared counter store coordinates limits using a token bucket or sliding window algorithm.
- Algorithm: token bucket refills tokens at a fixed rate, while a sliding window log or counter approximates a rolling window.
- Shared state: counters live in a fast store such as Redis keyed by client and window so all servers agree.
- Local optimization: servers can hold a small local batch of tokens to cut round trips to the shared store.
Bottlenecks
- Shared store latency: every check hitting Redis adds latency, so use atomic operations and local token batches.
- Race conditions: increments must be atomic so concurrent requests do not exceed the limit.
- Hot keys: a heavy client concentrates load on one key, mitigated by sharding the window.
Tradeoffs
- Strict global counting is accurate but adds a shared store round trip.
- Local token batches reduce latency but allow slight overshoot of the limit.
Key idea
A distributed rate limiter coordinates a token bucket or sliding window in a shared atomic counter store, optionally batching tokens locally to trade exactness for latency.