The Rolling Restart

Restarting database nodes one at a time keeps the cluster serving traffic while every node picks up new config or a new version.

The Goal

You need to restart every node in a cluster, perhaps for a version upgrade or a config change, without taking the whole service down. A rolling restart does this one node at a time, so the cluster as a whole stays available throughout.

The Loop

For each node, in turn:

Drain it so it stops taking new work.
Restart it with the new version or config.
Wait for healthy, confirming it rejoined and caught up on replication, before touching the next node.

Crucially you never move to the next node until the current one is back and healthy. Skipping that check can leave too few nodes serving and trigger an outage.

Special Care For The Primary

In a replicated cluster, restart replicas first. The primary is restarted last, usually by performing a controlled failover to an already restarted replica, so the write path is handed off cleanly rather than dropped.

Key idea

A rolling restart cycles nodes one at a time, draining and waiting for each to rejoin healthy before the next, and handles the primary last via controlled failover to keep the cluster serving throughout.

The Rolling Restart

The Goal

The Loop

Special Care For The Primary

Key idea

Check yourself