← Lessons

quiz vs the machine

Gold1440

System Design

Cron at Scale

Running scheduled jobs reliably across a fleet without duplicates or gaps.

5 min read · core · beat Gold to climb

Why single host cron breaks

A traditional cron runs on one machine. If that host dies, every scheduled job silently stops. Running cron on many hosts instead causes the same job to fire many times. Neither is acceptable at scale.

The reliable pattern

  • A leader owns the schedule, elected via a lock service so only one fires each job.
  • Job runs are recorded as idempotent records keyed by job name and fire time.
  • A worker claims a run record before executing, so a duplicate trigger finds it already claimed.

Handling missed windows

If the scheduler was down at a fire time, on recovery it must decide whether to backfill the missed run or skip it. The choice depends on the job: a report can backfill, a notification probably should not.

Key idea

Cron at scale elects one scheduler and keys runs by job and fire time so each scheduled job executes exactly once, with an explicit policy for missed windows.

Check yourself

Answer to earn rating on the learn ladder.

1. How does a scaled cron avoid firing the same job multiple times?

2. What decision arises after the scheduler was down during a fire time?