Rate Limiting

How a service politely says slow down before it gets crushed by too many requests.

Why limit at all

Rate limiting caps how many requests a client may make in a window. It protects a service from abuse, runaway clients, and accidental traffic spikes, and it keeps capacity fair across many users.

When a client exceeds its limit the server typically returns a status meaning too many requests, often with a hint telling the client when to retry.

The token bucket

The most popular algorithm is the token bucket. A bucket holds tokens up to a fixed capacity and refills at a steady rate. Each request removes one token. If the bucket is empty the request is rejected or delayed.

Capacity sets how large a burst is allowed.
Refill rate sets the sustained throughput over time.

This lets short bursts through while still bounding the long run average.

Other approaches

Fixed window counts requests per clock interval but can allow double the limit at the boundary.
Sliding window smooths that boundary by weighting the previous window.

In a distributed setup the counter usually lives in a shared store so all servers agree on one tally per client.

Key idea

A token bucket allows bursts up to a cap while bounding the steady rate over time.

Why limit at all

The token bucket

Other approaches

Key idea

Check yourself