A budget to spend
A latency budget is the total response time you promise, split across the stages a request passes through. If the goal is 200 milliseconds end to end, every hop must fit inside that ceiling combined.
Allocating the slices
- List the stages: network, load balancer, service, database, and serialization.
- Give each a share, leaving slack for variance.
For example, 200 milliseconds might allocate 20 to network, 50 to the database, 100 to service logic, and 30 as a buffer.
Mind the tail
Budgets must hold at the tail, not just the average. A service is judged on p99, the time under which 99 percent of requests complete, because the slow tail is what users feel.
When many services are chained, their tail latencies add up, so a request that touches ten services inherits each one's slow tail.
Key idea
A latency budget splits an end to end target across stages and must be met at the tail, since chained tails compound.