Rate Limiting at the Edge

Limiting Close to the Source

Rate limiting caps how many requests a client may send in a window. Doing it at the edge rejects excess traffic near the user, so abusive load never reaches origin.

Common Algorithms

Fixed window counts requests per time bucket, simple but bursty at edges
Sliding window smooths the boundary problem of fixed windows
Token bucket refills tokens at a steady rate and allows short bursts
Leaky bucket drains at a constant rate, enforcing a smooth output

The Distributed Challenge

Edges are spread worldwide, so a global limit requires shared state. Options trade accuracy against latency.

Local counters are fast but let a client exceed the global cap across nodes
Synced counters share state via a fast store, more accurate but slower
Approximate counting accepts small overcounts for speed

Responding

Return a clear 429 status with a retry hint so well behaved clients back off rather than hammer harder.

Key idea

Edge rate limiting rejects excess requests near the user using algorithms like token bucket, but enforcing a global cap across many edges forces a trade off between shared state accuracy and latency.

Rate Limiting at the Edge

Limiting Close to the Source

Common Algorithms

The Distributed Challenge

Responding

Key idea

Check yourself