Retry and Timeout Policies

Setting bounded retries and deadlines in the mesh without overloading downstreams.

Reliability With Limits

Retries and timeouts make calls more reliable, but used carelessly they cause outages. The mesh lets you set these as policy and adds guardrails the app might forget.

Timeouts First

Every call should have a timeout. Without one, a stuck downstream holds the caller forever. The mesh enforces a deadline per route, so a hung dependency returns an error instead of leaking resources.

Retries Done Right

Retry only idempotent operations, since retrying a write can double an effect.
Cap the number of attempts so a failure does not multiply load.
Use backoff with jitter to spread retries out in time.

The dangerous case is the retry storm. If every layer retries three times, a single failure can balloon into many times the traffic. The mesh supports a retry budget that limits retries to a fraction of active requests.

Key idea

The mesh enforces per route timeouts and bounded, idempotent retries with backoff and a retry budget so a single failure cannot snowball into a retry storm.

Retry and Timeout Policies

Reliability With Limits

Timeouts First

Retries Done Right

Key idea

Check yourself