What consensus means
Consensus is the problem of getting a group of nodes to agree on one value even when some nodes fail and messages are delayed. It underlies leader election, distributed locks, and replicated logs.
The properties
A correct consensus protocol must satisfy:
- Agreement no two nodes decide different values
- Validity the decided value was actually proposed by someone
- Termination every non faulty node eventually decides
- Integrity a node decides at most once
Why it is hard
Nodes have no shared clock and cannot tell a crashed peer from a slow one. A naive vote can stall forever if proposers keep colliding. Real protocols add structure such as rounds, leaders, and quorums to make progress likely.
Where it shows up
- Replicated state machines apply the same commands in the same order
- Configuration stores like etcd and ZooKeeper agree on cluster metadata
- Distributed locks need agreement on who holds the lock
The catch
You will see in the FLP result that termination cannot be guaranteed in a purely asynchronous network with even one failure, which is why practical systems lean on timeouts.
Key idea
Consensus is agreement on one value despite failures, defined by agreement validity termination and integrity, and it powers replicated logs locks and configuration.