A Budget for Time
A latency budget is a target for how long a user facing action may take, split into shares for each stage. Treating latency as a budget forces every component to stay within its allotment rather than hoping the total is acceptable.
Splitting the Budget
A request passes through many stages, and each consumes part of the budget.
- The network spends round trips on lookup, connection, and transfer.
- The server spends time on queuing, computation, and downstream calls.
- The client spends time parsing and rendering the response.
If the goal is two hundred milliseconds, the sum of all stages must fit, so each stage gets a slice and is held to it.
Tails Over Averages
Budgets must target tail latency, not the average, because a service is judged by its slow requests. A stage that is usually fast but occasionally very slow can blow the budget, so designers track high percentiles such as the ninety ninth and add timeouts and fallbacks to bound the worst case.
Key idea
A latency budget allocates a delay target across network, server, and client stages, and because users feel the slow requests it must be enforced against tail percentiles rather than averages.