Closing the Gaps
Hinted handoff and dropped messages leave replicas slightly out of sync. Anti entropy is the background process that compares two replicas and repairs differences. Comparing every key directly would move huge amounts of data, so replicas use Merkle trees to find divergence cheaply.
How the Tree Works
A Merkle tree is a tree of hashes. Each leaf hashes a small range of keys, and each parent hashes its children. The root hash summarizes the whole data set.
- If two roots match, the replicas are identical and no transfer is needed.
- If roots differ, the replicas compare child hashes and walk down only the branches that disagree.
- Divergence is isolated to a few leaf ranges, which are then exchanged.
Why It Scales
The cost of finding differences is proportional to the amount of data that actually differs, not the size of the data set. Two nearly identical replicas confirm equality in a single root comparison, while a small drift only forces a few branches to be exchanged.
Key idea
Merkle trees let replicas compare root hashes and descend only into differing branches, so anti entropy syncs work scales with how much data actually diverges.