Rate Limiting Algorithms: Token Bucket vs Leaky Bucket vs Sliding Window
A clear, practical comparison of the four rate-limiting algorithms — token bucket, leaky bucket, fixed window, and sliding window — with the trade-offs that decide which one to ship.
Every API that survives contact with real traffic ends up needing a rate limiter. It's your bouncer: it protects the database from stampedes, enforces billing tiers, and blunts abuse. The interesting part is that four common algorithms all do the job differently, and the differences matter the day a customer asks why they got a 429.
Why you need one
A rate limiter answers a deceptively simple question: has this client made too many requests in some window of time? The subtlety is in how you define "too many" and "window" — and whether you allow short bursts or smooth traffic flat. Let's take the four in order.
Fixed window counter
The naive approach: bucket time into fixed intervals (say, one minute) and count requests per client per bucket. Hit the limit, reject the rest until the window rolls over.
It's trivial to implement with a single counter and a TTL. The flaw is the boundary burst: a client can send the full limit at the very end of one window and again at the very start of the next, briefly doing double the intended rate across the seam.
Sliding window
The sliding window fixes the boundary problem by tracking a rolling time range instead of a fixed one. Two flavors:
- Sliding window log: store a timestamp per request and count how many fall inside the trailing window. Exact, but memory grows with traffic.
- Sliding window counter: approximate by weighting the previous window's count by how much of it still overlaps. Cheap, smooth, and accurate enough for almost everyone. This is the pragmatic production default.
Token bucket
A bucket holds tokens up to a capacity and refills at a steady rate. Each request spends one token; an empty bucket means rejection. The magic is that it allows bursts up to the bucket size while enforcing an average rate over time — exactly what you want for bursty, human-driven APIs.
Leaky bucket
The leaky bucket is the mirror image. Requests enter a queue and drain at a constant rate, like water leaking from a bucket with a hole. Overflow when the queue is full means rejection.
The defining property: it smooths traffic into a steady outflow regardless of how spiky the input was. That's ideal when a downstream service needs a predictable, flat request rate — but it can add latency, since requests wait in the queue rather than firing immediately.
How to choose
- Allow bursts, enforce an average: token bucket. The right call for most user-facing APIs.
- Force a smooth, constant outflow: leaky bucket. Best when protecting a fragile downstream.
- Need accuracy without boundary abuse: sliding window counter.
- Quick and dirty, low stakes: fixed window — just know about the seam.
One more production note: in a multi-server deployment, the limiter state must be shared, usually in Redis, so a client can't dodge the limit by hitting different nodes. An atomic check-and-decrement keeps it correct under concurrency.
The windowing logic here is closely related to interval reasoning — if rolling windows feel fuzzy, the merge intervals problem builds the exact muscle. When you're ready to implement one end to end, the rate limiter challenge makes it concrete, and the System Design lessons place it in a real architecture.
Ship it, then defend it
Reading the trade-offs is the easy half. Implement a limiter, then defend your choice against an AI interviewer at your tier in the arena — start from the roadmaps and see if you can still beat the machine.