Design a Distributed Rate Limiter

Requirements

Limit a client to n requests per time window across all servers.
Enforce limits with minimal added latency.
Behave correctly even when servers see traffic independently.

High level design

A shared counter store coordinates limits using a token bucket or sliding window algorithm.

Algorithm: token bucket refills tokens at a fixed rate, while a sliding window log or counter approximates a rolling window.
Shared state: counters live in a fast store such as Redis keyed by client and window so all servers agree.
Local optimization: servers can hold a small local batch of tokens to cut round trips to the shared store.

Bottlenecks

Shared store latency: every check hitting Redis adds latency, so use atomic operations and local token batches.
Race conditions: increments must be atomic so concurrent requests do not exceed the limit.
Hot keys: a heavy client concentrates load on one key, mitigated by sharding the window.

Tradeoffs

Strict global counting is accurate but adds a shared store round trip.
Local token batches reduce latency but allow slight overshoot of the limit.

Key idea

A distributed rate limiter coordinates a token bucket or sliding window in a shared atomic counter store, optionally batching tokens locally to trade exactness for latency.

Design a Distributed Rate Limiter

Requirements

High level design

Bottlenecks

Tradeoffs

Key idea

Check yourself