How it works
A token bucket holds up to a fixed capacity of tokens. Tokens are added at a steady refill rate, for example ten per second, but the bucket never overflows past its capacity. Each request must take one token to proceed. If the bucket is empty, the request is rejected or made to wait.
Why it allows bursts
Because tokens accumulate up to the capacity while traffic is idle, a client can spend a saved up pile all at once. This permits a burst equal to the bucket size, then settles back to the steady refill rate. The average rate is bounded by the refill, while short spikes up to the capacity are allowed.
Tuning the two knobs
- Refill rate sets the sustained throughput.
- Capacity sets how large a burst you tolerate.
This separation is why token bucket is the most common choice for public APIs: it caps the long run rate while still feeling responsive to bursty clients.
Key idea
A token bucket bounds the average rate by its refill while permitting bursts up to its capacity.