← Lessons

quiz vs the machine

Gold1390

Databases

The Rolling Restart

Restarting database nodes one at a time keeps the cluster serving traffic while every node picks up new config or a new version.

4 min read · core · beat Gold to climb

The Goal

You need to restart every node in a cluster, perhaps for a version upgrade or a config change, without taking the whole service down. A rolling restart does this one node at a time, so the cluster as a whole stays available throughout.

The Loop

For each node, in turn:

  • Drain it so it stops taking new work.
  • Restart it with the new version or config.
  • Wait for healthy, confirming it rejoined and caught up on replication, before touching the next node.

Crucially you never move to the next node until the current one is back and healthy. Skipping that check can leave too few nodes serving and trigger an outage.

Special Care For The Primary

In a replicated cluster, restart replicas first. The primary is restarted last, usually by performing a controlled failover to an already restarted replica, so the write path is handed off cleanly rather than dropped.

Key idea

A rolling restart cycles nodes one at a time, draining and waiting for each to rejoin healthy before the next, and handles the primary last via controlled failover to keep the cluster serving throughout.

Check yourself

Answer to earn rating on the learn ladder.

1. What must happen before moving to the next node in a rolling restart?

2. How is the primary usually handled in a rolling restart?

3. What is the main benefit of restarting one node at a time?