Why limit at the edge
Rate limiting caps how many requests a client may make in a window. Doing it at the edge stops floods before they reach origin.
- It protects backends from abuse, scraping, and accidental loops.
- It enforces fair use across tenants on shared infrastructure.
- Edge enforcement saves origin capacity for legitimate traffic.
How a limiter decides
The edge checks a counter for the client key and either passes or rejects the request.
Common algorithms
- Token bucket refills tokens at a steady rate and allows short bursts.
- Fixed window counts requests per interval but spikes at boundaries.
- Sliding window smooths those boundary spikes for fairer limits.
Practical notes
- Key limits by API token, IP, or user, choosing what you can trust.
- Return a clear status with a retry after hint so clients back off.
- Keep counters in a fast shared store so all edge nodes agree.
Key idea
Rate limiting at the edge shields your origin by rejecting excess traffic early, using algorithms like token bucket to allow bursts while capping sustained load.