← Lessons

quiz vs the machine

Platinum1760

System Design

The Merkle Tree Anti Entropy Revisited

Efficiently finding which keys differ between two replicas by comparing hashes top down.

5 min read · advanced · beat Platinum to climb

Why background repair is needed

Read repair and hinted handoff fix divergence as a side effect of traffic, but cold data that is rarely read can stay inconsistent for a long time. An anti entropy process runs in the background to compare replicas and copy missing data.

The naive approach, sending every key to compare, is far too expensive. Merkle trees make the comparison efficient.

How the tree works

A Merkle tree is a tree of hashes. Each leaf hashes one bucket of keys. Each internal node hashes its children. The root therefore summarizes the entire data range in a single hash.

To compare two replicas:

  • Exchange roots. If they match, the replicas are identical and nothing more is sent.
  • If roots differ, compare child hashes, descending only into subtrees that disagree.
  • Continue until reaching the leaves that actually differ, then sync just those keys.

This means the amount of data exchanged is proportional to the number of differences, not the total dataset size.

Key idea

Merkle trees let anti entropy pinpoint divergent keys by comparing hashes from the root down, transferring only the data that actually differs.

Check yourself

Answer to earn rating on the learn ladder.

1. What does each leaf of a Merkle tree hash?

2. Why are Merkle trees efficient for comparing replicas?

3. What problem does anti entropy specifically address?