← Lessons

quiz vs the machine

Platinum1700

System Design

Design a Distributed Rate Limiter

Enforce request quotas across many servers consistently and with low overhead.

6 min read · advanced · beat Platinum to climb

Requirements

  • Limit a client to n requests per time window across all servers.
  • Enforce limits with minimal added latency.
  • Behave correctly even when servers see traffic independently.

High level design

A shared counter store coordinates limits using a token bucket or sliding window algorithm.

  • Algorithm: token bucket refills tokens at a fixed rate, while a sliding window log or counter approximates a rolling window.
  • Shared state: counters live in a fast store such as Redis keyed by client and window so all servers agree.
  • Local optimization: servers can hold a small local batch of tokens to cut round trips to the shared store.

Bottlenecks

  • Shared store latency: every check hitting Redis adds latency, so use atomic operations and local token batches.
  • Race conditions: increments must be atomic so concurrent requests do not exceed the limit.
  • Hot keys: a heavy client concentrates load on one key, mitigated by sharding the window.

Tradeoffs

  • Strict global counting is accurate but adds a shared store round trip.
  • Local token batches reduce latency but allow slight overshoot of the limit.

Key idea

A distributed rate limiter coordinates a token bucket or sliding window in a shared atomic counter store, optionally batching tokens locally to trade exactness for latency.

Check yourself

Answer to earn rating on the learn ladder.

1. Why must counter increments be atomic?

2. What is the tradeoff of holding local token batches?