The goal
A rate limiter caps how many requests pass in a time window, for example one hundred per second. Under concurrency many threads check and update the same counters, so the check and the increment must be atomic or two threads can both slip past the limit.
The token bucket
A common design is the token bucket:
- A bucket holds up to a fixed number of tokens.
- Tokens refill at a steady rate up to the capacity.
- Each request must remove one token to proceed, otherwise it is rejected or delayed.
The bucket allows short bursts up to capacity while bounding the long run rate.
Making it thread safe
The take a token step must be a single atomic operation. Options include:
- A mutex around the read modify write of the token count.
- An atomic compare and swap loop that retries if another thread changed the count first.
- Sharding counters per core and reconciling, to reduce contention on a hot path.
The subtle bug is a check then act race where two threads both read one token available and both proceed. Atomicity closes that window.
Key idea
A concurrent rate limiter must make the check and decrement of tokens atomic, and a token bucket bounds the rate while allowing bursts up to its capacity.