The on call rotation
Someone has to answer when production breaks at three in the morning. An on call rotation assigns that responsibility to one person at a time and cycles it through the team so no one carries it alone.
How it works
- A schedule names who is on call for each window, often a week
- An escalation policy pages a backup if the primary does not acknowledge in time
- A runbook gives steps to triage common alerts so the responder is not starting blind
The first job on call is not always to fix the root cause. It is to stabilize, mitigate the user impact, and then hand off or investigate calmly.
Keeping it humane
On call burns people out when it is noisy or lonely. Healthy rotations:
- Keep page volume low by tuning alerts to real symptoms
- Track toil so recurring pages get fixed, not just acknowledged
- Offer compensation or time back, and never staff a rotation with one person
A blameless culture matters too, so responders share what broke without fear.
Key idea
A rotation shares response duty with a schedule, escalation, and runbooks, and stays sustainable only when pages map to real symptoms.