← Lessons

quiz vs the machine

Gold1400

Databases

The Read Amplification

Read amplification counts how many disk lookups the engine performs to satisfy a single logical read from the application.

5 min read · core · beat Gold to climb

What Read Amplification Is

Read amplification is the number of physical I O operations the storage engine must do to answer one logical read. If a key lookup touches the memtable and three on disk files, that read had an amplification of four.

Where It Comes From

  • In an LSM tree, a key may live in the memtable and several SSTables, so the engine checks each one until it finds the newest value.
  • In a B tree, the engine walks from the root to a leaf, touching one page per level.

LSM trees usually have higher read amplification because immutable files accumulate copies of a key.

How Engines Reduce It

  • Bloom filters let a reader skip an SSTable that definitely does not contain the key.
  • Block caches keep hot pages in memory, avoiding disk entirely.
  • Compaction reduces the number of files a read must inspect.
  • Sparse indexes point directly to the right block within a file.

The Tradeoff

Reducing read amplification through aggressive compaction increases write amplification. Engines balance the two based on whether the workload is read heavy or write heavy.

Key idea

Read amplification is the count of physical lookups per logical read, and bloom filters, caches, and compaction keep it under control.

Check yourself

Answer to earn rating on the learn ladder.

1. What does read amplification count?

2. How does a bloom filter reduce read amplification?

3. Why do LSM trees often have higher read amplification than B trees?