Byzantine Fault Tolerance

Reaching agreement when nodes may lie, send conflicting messages, or act maliciously.

Beyond crashes

Most consensus assumes nodes either work correctly or stop. A Byzantine fault is worse, a node that behaves arbitrarily, perhaps sending different answers to different peers, whether from a bug, corruption, or an attacker. Byzantine fault tolerance survives these liars.

The famous bound

To tolerate f Byzantine nodes you need at least 3f plus 1 total nodes. The intuition is that honest nodes must form a clear majority even when faulty nodes lie and some honest ones are slow or unreachable.

How protocols cope

Protocols like PBFT make nodes exchange messages in rounds and only commit when a supermajority agree, so a few liars cannot forge a decision. Cryptographic signatures prevent a node from forging messages on behalf of another.

The cost

Byzantine agreement is expensive. It needs more nodes, more message rounds, and signature verification, which limits throughput. Inside a trusted datacenter, crash tolerance like Raft is usually enough.

Where it matters

BFT shines where participants do not trust each other, such as blockchains and aerospace systems with unreliable sensors. The trust model justifies the overhead.

Key idea

Byzantine fault tolerance survives nodes that lie by requiring at least three f plus one nodes and supermajority agreement, at significant cost.