Partition Tolerance And Split Brain
A network partition is when links fail and a cluster splits into groups that cannot talk to each other, even though each group is still running. Partition tolerance means the system keeps functioning despite this. The CAP theorem says that during a partition you must choose between consistency and availability, you cannot have both.
The dangerous failure is split brain. Suppose a cluster has a leader, and a partition cuts it in two. The old leader is in one half. The other half cannot reach the leader, declares it dead, and elects a new one. Now both halves have a leader, both accept writes, and the data diverges. When the partition heals, the two histories conflict and one set of writes must be discarded.
The standard defense is a quorum or majority requirement. A group may only elect a leader and accept writes if it holds a strict majority of nodes. In any partition at most one side can hold a majority, so at most one side stays active. The minority side refuses to act, sacrificing availability there to preserve consistency. This is why clusters favor odd node counts, so a split always yields a clear majority.
- Split brain Two leaders accept conflicting writes during a partition.
- Majority quorum Only the side with more than half can act.
- Odd sizing Avoids a tie where neither side has a majority.
Key idea
A partition can cause split brain where two halves each elect a leader and diverge, so systems require a majority quorum to act, letting only one side continue and preserving consistency over availability.