← Lessons

quiz vs the machine

Gold1380

System Design

The Rate Limiter Design Recap

Capping request rates per client to protect a service, with token bucket as the core.

5 min read · core · beat Gold to climb

Why limit rates

A rate limiter caps how many requests a client may make in a window, protecting a service from abuse, runaway clients, and accidental floods. It sits in front of the real work and rejects or delays excess traffic.

The token bucket model

The common design is a token bucket per client:

  • Tokens refill at a fixed rate, say ten per second, up to a cap.
  • Each request consumes one token.
  • If the bucket has a token, allow the request. If empty, reject or queue it.

The cap allows short bursts, while the refill rate bounds the long run average. Other models include fixed window, sliding window, and leaky bucket.

Where it runs

  • A limiter at the edge or gateway blocks abuse before it reaches services.
  • Counters live in a fast shared store so limits hold across many servers, not just one.

What to return

  • A rejected request gets a clear status, often too many requests, with a retry hint.
  • Limits are keyed by client, by api key, user id, or ip.

Key idea

A rate limiter caps client request rates, commonly with a token bucket that refills at a fixed rate and allows bursts up to a cap, using a shared counter so limits hold across servers.

Check yourself

Answer to earn rating on the learn ladder.

1. In a token bucket, what allows short bursts of traffic?

2. Why store rate limit counters in a shared store?

3. What status typically signals a rejected request?