What Read Amplification Is
Read amplification is the number of physical I O operations the storage engine must do to answer one logical read. If a key lookup touches the memtable and three on disk files, that read had an amplification of four.
Where It Comes From
- In an LSM tree, a key may live in the memtable and several SSTables, so the engine checks each one until it finds the newest value.
- In a B tree, the engine walks from the root to a leaf, touching one page per level.
LSM trees usually have higher read amplification because immutable files accumulate copies of a key.
How Engines Reduce It
- Bloom filters let a reader skip an SSTable that definitely does not contain the key.
- Block caches keep hot pages in memory, avoiding disk entirely.
- Compaction reduces the number of files a read must inspect.
- Sparse indexes point directly to the right block within a file.
The Tradeoff
Reducing read amplification through aggressive compaction increases write amplification. Engines balance the two based on whether the workload is read heavy or write heavy.
Key idea
Read amplification is the count of physical lookups per logical read, and bloom filters, caches, and compaction keep it under control.