← Lessons

quiz vs the machine

Gold1450

System Design

Rate Limiting in the Mesh

Protecting services from overload with local and global request limits at the proxy.

5 min read · core · beat Gold to climb

Guarding Capacity

Rate limiting caps how many requests a service accepts in a window, protecting it from overload, abuse, and noisy neighbors. The mesh enforces limits at the proxy, before traffic reaches your app.

Local vs Global

  • Local rate limiting runs entirely in each proxy. It is fast and needs no coordination, but each proxy counts independently.
  • Global rate limiting consults a shared service so a limit applies across every replica together.

Local limits suit per instance protection. Global limits suit a true fleet wide quota, like one thousand requests per second per customer no matter which pod serves them.

The Algorithm

Most proxies use a token bucket. Tokens refill at a steady rate, each request spends one, and an empty bucket means requests are rejected with a too many requests status. This allows short bursts while bounding the sustained rate.

Why at the Mesh

Putting limits in the proxy means rejected traffic never touches application threads, and the policy is uniform across services. The app stays simple while the platform enforces fairness.

Key idea

The mesh enforces rate limits at the proxy using token buckets, with local limits per instance and global limits coordinated across replicas, so overload is rejected before it reaches your app.

Check yourself

Answer to earn rating on the learn ladder.

1. What distinguishes global from local rate limiting?

2. Why allow rate limiting at the proxy rather than in the app?