quiz vs the machine

Gold1450

System Design

Design a Rate Limiter

Cap how many requests a client may make in a window across many servers.

6 min read · core · beat Gold to climb

Requirements

Limit each client to a fixed number of requests per time window.
Decisions must be fast and consistent across many API servers.
Reject excess requests cleanly with a retry hint.

High level design

A limiter sits in front of the service, keying on user id or IP. The common algorithms trade accuracy for memory.

Token bucket: tokens refill at a steady rate and each request spends one, allowing bursts up to the bucket size.
Sliding window: counts requests in the recent window for smoother limits without edge spikes.
Fixed window: a simple counter per window that is cheap but allows double bursts at boundaries.

A shared store such as a fast in memory cache holds the counters so every server sees the same state.

Bottlenecks

Shared counter latency: every request reads and writes the store, so colocate the cache and use atomic increments.
Race conditions: read then write is not atomic, so use a single atomic operation or a small script that runs server side.
Distributed accuracy: local per node limits drift, so a central store keeps the global count honest at the cost of one network hop.

Return a clear status code with a retry after header so clients back off instead of hammering.

Key idea

A rate limiter is a fast shared counter problem, and the algorithm choice balances burst tolerance against memory and accuracy.

Check yourself

Answer to earn rating on the learn ladder.

1. What advantage does token bucket give over fixed window?

2. Why use an atomic increment in the shared store?