The Bloom Filter in LSM

A bloom filter lets an LSM tree skip files that cannot contain a key, cutting wasted disk reads during point lookups.

The Read Problem

A point lookup in an LSM tree may have to inspect many sorted files because a key could live in any of them. Reading each file from disk just to learn the key is absent wastes input output.

What a Bloom Filter Does

A bloom filter is a compact bit array paired with several hash functions. To add a key, each hash sets one bit. To test a key, the filter checks the same bits.

If any tested bit is zero, the key is definitely absent.
If all bits are one, the key is probably present.

So a filter never gives a false negative, but it can give a false positive. Each sorted file carries its own bloom filter in memory.

Why It Helps

Before touching a file on disk, the engine checks the filter. A negative result lets it skip the file entirely. Only files that pass the filter are actually read, so most absent key lookups avoid disk work.

The Tradeoff

More bits per key lowers the false positive rate but uses more memory. Engines tune this to balance memory against wasted reads.

Key idea