PBFT Overview

What PBFT made practical

Practical Byzantine Fault Tolerance, by Castro and Liskov in 1999, was the first BFT protocol fast enough for real systems. It tolerates f faulty nodes out of 3f plus 1 while serving requests with reasonable latency.

The primary and replicas

One node acts as the primary and orders requests; the rest are backups. A request flows through three phases that build agreement and detect a bad primary.

The three phases

Pre prepare the primary assigns a sequence number and broadcasts the request
Prepare replicas broadcast that they saw the same assignment; collecting 2f matching messages proves agreement on order
Commit replicas broadcast commit; collecting 2f plus 1 commits means the request is committed and can execute

The two voting rounds ensure that even with f liars, honest nodes agree on a single ordering.

View changes

If the primary is faulty or slow, replicas trigger a view change to elect a new primary, preserving safety across the switch.

Trade offs

Strong safety with up to f Byzantine nodes
Message cost grows quadratically with node count
Best for small permissioned clusters, not open networks

Key idea