← Lessons

quiz vs the machine

Silver1080

Databases

The Bloom Filter in LSM

A bloom filter lets an LSM tree skip files that cannot contain a key, cutting wasted disk reads during point lookups.

4 min read · intro · beat Silver to climb

The Read Problem

A point lookup in an LSM tree may have to inspect many sorted files because a key could live in any of them. Reading each file from disk just to learn the key is absent wastes input output.

What a Bloom Filter Does

A bloom filter is a compact bit array paired with several hash functions. To add a key, each hash sets one bit. To test a key, the filter checks the same bits.

  • If any tested bit is zero, the key is definitely absent.
  • If all bits are one, the key is probably present.

So a filter never gives a false negative, but it can give a false positive. Each sorted file carries its own bloom filter in memory.

Why It Helps

Before touching a file on disk, the engine checks the filter. A negative result lets it skip the file entirely. Only files that pass the filter are actually read, so most absent key lookups avoid disk work.

The Tradeoff

More bits per key lowers the false positive rate but uses more memory. Engines tune this to balance memory against wasted reads.

Key idea

A bloom filter cheaply rules out files that cannot hold a key, so an LSM lookup reads disk only for files that might match.

Check yourself

Answer to earn rating on the learn ladder.

1. What can a bloom filter never produce?

2. How does a bloom filter speed up LSM point reads?

3. What happens when you give a bloom filter more bits per key?