← Lessons

quiz vs the machine

Platinum1750

System Design

Byzantine Fault Tolerance

Reaching agreement when nodes may lie, send conflicting messages, or act maliciously.

6 min read · advanced · beat Platinum to climb

Beyond crashes

Most consensus assumes nodes either work correctly or stop. A Byzantine fault is worse, a node that behaves arbitrarily, perhaps sending different answers to different peers, whether from a bug, corruption, or an attacker. Byzantine fault tolerance survives these liars.

The famous bound

To tolerate f Byzantine nodes you need at least 3f plus 1 total nodes. The intuition is that honest nodes must form a clear majority even when faulty nodes lie and some honest ones are slow or unreachable.

How protocols cope

Protocols like PBFT make nodes exchange messages in rounds and only commit when a supermajority agree, so a few liars cannot forge a decision. Cryptographic signatures prevent a node from forging messages on behalf of another.

The cost

Byzantine agreement is expensive. It needs more nodes, more message rounds, and signature verification, which limits throughput. Inside a trusted datacenter, crash tolerance like Raft is usually enough.

Where it matters

BFT shines where participants do not trust each other, such as blockchains and aerospace systems with unreliable sensors. The trust model justifies the overhead.

Key idea

Byzantine fault tolerance survives nodes that lie by requiring at least three f plus one nodes and supermajority agreement, at significant cost.

Check yourself

Answer to earn rating on the learn ladder.

1. How many total nodes are needed to tolerate f Byzantine faults?

2. What distinguishes a Byzantine fault from a crash fault?

3. Why is crash tolerance like Raft usually enough inside a datacenter?