The idea
In an eventually consistent store, replicas can drift apart after failures or dropped writes. Read repair uses the reads that already happen to detect and heal this drift, so the system slowly converges without a separate sweep.
When a read gathers responses from several replicas, the coordinator compares their versions. If some replicas return stale data, the coordinator pushes the newest value back to them.
Two flavors
- Foreground read repair fixes the divergent replicas before returning to the client, costing a little extra latency on that request.
- Background read repair returns the freshest value to the client immediately and repairs the lagging replicas asynchronously.
Why it is not enough alone
Read repair only heals keys that are actually read. Rarely read keys can stay stale for a long time, so systems pair it with anti entropy background jobs that compare whole datasets and reconcile them.
Key idea
Read repair piggybacks on normal reads to converge stale replicas, but it needs anti entropy for keys nobody reads.