A bucket of permits
A token bucket limits a rate by handing out tokens. Each request spends one token; if the bucket is empty the request is delayed or rejected. Tokens refill at a steady rate, which sets the long run average.
The two parameters
- The refill rate sets the sustained average, for example one hundred tokens per second.
- The bucket capacity sets how big a burst is allowed. A capacity of two hundred lets two hundred requests fire at once after an idle stretch.
Refill is continuous: if the rate is one hundred per second, roughly one token returns every ten milliseconds rather than a hundred all at once each second.
Bucket versus fixed window
A naive fixed window counter allows a double burst at the boundary, since a full window of requests at the end of one window and the start of the next bunch together. A token bucket avoids that because the refill is smooth and the capacity caps the burst directly.
Key idea
A token bucket sets the average with its refill rate and the allowed burst with its capacity, smoothing traffic without the boundary double burst of a fixed window.