← Lessons

quiz vs the machine

Gold1450

System Design

Latency Budget Allocation

Dividing an end to end response target across the hops so no single stage blows the budget.

5 min read · core · beat Gold to climb

A budget to spend

A latency budget is the total response time you promise, split across the stages a request passes through. If the goal is 200 milliseconds end to end, every hop must fit inside that ceiling combined.

Allocating the slices

  • List the stages: network, load balancer, service, database, and serialization.
  • Give each a share, leaving slack for variance.

For example, 200 milliseconds might allocate 20 to network, 50 to the database, 100 to service logic, and 30 as a buffer.

Mind the tail

Budgets must hold at the tail, not just the average. A service is judged on p99, the time under which 99 percent of requests complete, because the slow tail is what users feel.

When many services are chained, their tail latencies add up, so a request that touches ten services inherits each one's slow tail.

Key idea

A latency budget splits an end to end target across stages and must be met at the tail, since chained tails compound.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a latency budget do?

2. Why care about p99 latency rather than the average?