The Goal
You need to restart every node in a cluster, perhaps for a version upgrade or a config change, without taking the whole service down. A rolling restart does this one node at a time, so the cluster as a whole stays available throughout.
The Loop
For each node, in turn:
- Drain it so it stops taking new work.
- Restart it with the new version or config.
- Wait for healthy, confirming it rejoined and caught up on replication, before touching the next node.
Crucially you never move to the next node until the current one is back and healthy. Skipping that check can leave too few nodes serving and trigger an outage.
Special Care For The Primary
In a replicated cluster, restart replicas first. The primary is restarted last, usually by performing a controlled failover to an already restarted replica, so the write path is handed off cleanly rather than dropped.
Key idea
A rolling restart cycles nodes one at a time, draining and waiting for each to rejoin healthy before the next, and handles the primary last via controlled failover to keep the cluster serving throughout.