← Lessons

quiz vs the machine

Gold1360

System Design

The On Call Rotation

Sharing the duty of responding to alerts sustainably across a team.

4 min read · core · beat Gold to climb

The on call rotation

Someone has to answer when production breaks at three in the morning. An on call rotation assigns that responsibility to one person at a time and cycles it through the team so no one carries it alone.

How it works

  • A schedule names who is on call for each window, often a week
  • An escalation policy pages a backup if the primary does not acknowledge in time
  • A runbook gives steps to triage common alerts so the responder is not starting blind

The first job on call is not always to fix the root cause. It is to stabilize, mitigate the user impact, and then hand off or investigate calmly.

Keeping it humane

On call burns people out when it is noisy or lonely. Healthy rotations:

  • Keep page volume low by tuning alerts to real symptoms
  • Track toil so recurring pages get fixed, not just acknowledged
  • Offer compensation or time back, and never staff a rotation with one person

A blameless culture matters too, so responders share what broke without fear.

Key idea

A rotation shares response duty with a schedule, escalation, and runbooks, and stays sustainable only when pages map to real symptoms.

Check yourself

Answer to earn rating on the learn ladder.

1. What does an escalation policy do?

2. What is the first responsibility of an on call engineer during an incident?