← Lessons

quiz vs the machine

Gold1350

System Design

Rate Limiting

How a service politely says slow down before it gets crushed by too many requests.

4 min read · core · beat Gold to climb

Why limit at all

Rate limiting caps how many requests a client may make in a window. It protects a service from abuse, runaway clients, and accidental traffic spikes, and it keeps capacity fair across many users.

When a client exceeds its limit the server typically returns a status meaning too many requests, often with a hint telling the client when to retry.

The token bucket

The most popular algorithm is the token bucket. A bucket holds tokens up to a fixed capacity and refills at a steady rate. Each request removes one token. If the bucket is empty the request is rejected or delayed.

  • Capacity sets how large a burst is allowed.
  • Refill rate sets the sustained throughput over time.

This lets short bursts through while still bounding the long run average.

Other approaches

  • Fixed window counts requests per clock interval but can allow double the limit at the boundary.
  • Sliding window smooths that boundary by weighting the previous window.

In a distributed setup the counter usually lives in a shared store so all servers agree on one tally per client.

Key idea

A token bucket allows bursts up to a cap while bounding the steady rate over time.

Check yourself

Answer to earn rating on the learn ladder.

1. In a token bucket, what does the bucket capacity control?

2. Why can a naive fixed window allow nearly double the limit?

3. Where is the counter usually kept in a distributed rate limiter?