Rate Limiting in the Mesh

Protecting services from overload with local and global request limits at the proxy.

Guarding Capacity

Rate limiting caps how many requests a service accepts in a window, protecting it from overload, abuse, and noisy neighbors. The mesh enforces limits at the proxy, before traffic reaches your app.

Local vs Global

Local rate limiting runs entirely in each proxy. It is fast and needs no coordination, but each proxy counts independently.
Global rate limiting consults a shared service so a limit applies across every replica together.

Local limits suit per instance protection. Global limits suit a true fleet wide quota, like one thousand requests per second per customer no matter which pod serves them.

The Algorithm

Most proxies use a token bucket. Tokens refill at a steady rate, each request spends one, and an empty bucket means requests are rejected with a too many requests status. This allows short bursts while bounding the sustained rate.

Why at the Mesh

Putting limits in the proxy means rejected traffic never touches application threads, and the policy is uniform across services. The app stays simple while the platform enforces fairness.

Key idea