← Lessons

quiz vs the machine

Gold1500

System Design

Broker Failover

Promoting a follower to leader so a partition stays available after a broker dies.

5 min read · core · beat Gold to climb

When a leader dies

If the broker holding a partition leader fails, that partition cannot serve writes until a new leader takes over. Failover promotes one of the in sync followers to be the new leader so the partition recovers quickly.

Who decides

A controller or coordination layer detects the failure, usually via missed heartbeats, and elects a new leader from the in sync replica set. Choosing only from the ISR ensures the new leader already has all acknowledged data.

Unclean election

If no in sync replica survives, an operator may allow an unclean leader election, promoting an out of date follower. This restores availability but can lose recently acknowledged messages, a direct trade of consistency for uptime.

After failover

Producers and consumers refresh their metadata to learn the new leader and reconnect. A brief unavailability window occurs while the election completes.

Flow

Key idea

Failover elects a new leader from the in sync set to restore a partition; unclean election trades possible data loss for faster availability.

Check yourself

Answer to earn rating on the learn ladder.

1. Why elect the new leader from the in sync replica set?

2. What does an unclean leader election risk?

3. What must producers and consumers do after a failover?