← Lessons

quiz vs the machine

Gold1380

System Design

Error Budgets and Policy

Using the gap between perfect and your SLO as a shared currency for risk.

5 min read · core · beat Gold to climb

The budget hidden in your SLO

If your service level objective is ninety nine point nine percent success, then one tenth of one percent of requests are allowed to fail. That allowance is your error budget. It is permission to be imperfect, spent over the measurement window.

Why a budget changes behavior

The error budget turns reliability from an argument into accounting. Instead of debating whether a risky launch is acceptable, teams ask whether the budget can pay for it.

  • When the budget is healthy, teams can ship faster and take more risk.
  • When the budget is exhausted, the system is already too unreliable, so risky work pauses.

The error budget policy

A policy is the agreed rule for what happens at each state. A typical policy freezes feature launches and redirects effort to reliability work once the budget is spent, and unfreezes when it recovers. Writing this down before an incident removes emotion from the decision.

Key idea

An error budget is the spendable gap between perfect and your SLO, and a written policy decides how that currency governs risk.

Check yourself

Answer to earn rating on the learn ladder.

1. What is an error budget?

2. What typically happens when the error budget is exhausted?

3. Why write the error budget policy before an incident?