← Lessons

quiz vs the machine

Gold1420

System Design

Rate Limiting Strategies

Token bucket, leaky bucket, and fixed windows.

5 min read · core · beat Gold to climb

Why Limit

Rate limiting protects an API from abuse, runaway clients, and accidental overload. It caps how many requests a caller may make in a period and rejects or delays the rest, usually with a 429 Too Many Requests status.

Common Algorithms

  • Fixed window counts requests per clock interval; simple but bursts at edges.
  • Sliding window smooths those edges by weighting recent counts.
  • Token bucket refills tokens at a steady rate and allows short bursts.
  • Leaky bucket drains requests at a constant rate to smooth output.

Where to Enforce

Limits usually live at the API gateway so every backend is protected uniformly. Track counters per API key or per IP in a fast shared store such as Redis so all gateway nodes agree. Tell clients their remaining quota with response headers.

Key idea

Rate limiters such as token bucket cap request rates at the gateway and reply with 429 when exceeded, protecting backends while allowing controlled bursts.

Check yourself

Answer to earn rating on the learn ladder.

1. Which algorithm allows short bursts by refilling tokens at a steady rate?

2. Which status code signals a client has been rate limited?