Why allow bursts
Real traffic is rarely smooth. A page load fires several requests at once, a batch job flushes a queue, a user clicks rapidly. A limiter that allows only the exact steady rate would reject this normal behavior. A burst allowance lets a client briefly exceed the sustained rate, as long as the long run average stays within budget.
How it is expressed
- A sustained rate, the average allowed over time, such as ten per second.
- A burst size, the extra capacity available for a short spike, such as fifty requests.
In a token bucket the burst size is simply the bucket capacity and the sustained rate is the refill rate. The bucket fills during idle moments so a client can spend a backlog quickly, then is held to the steady rate.
The balance
Too small a burst frustrates legitimate spiky clients. Too large a burst lets abusers concentrate load and stress downstream systems. The right size matches how bursty honest usage really is.
Key idea
A burst allowance lets clients briefly exceed the steady rate so normal spiky usage is permitted while the average stays bounded.