The Read Problem
A point lookup in an LSM tree may have to inspect many sorted files because a key could live in any of them. Reading each file from disk just to learn the key is absent wastes input output.
What a Bloom Filter Does
A bloom filter is a compact bit array paired with several hash functions. To add a key, each hash sets one bit. To test a key, the filter checks the same bits.
- If any tested bit is zero, the key is definitely absent.
- If all bits are one, the key is probably present.
So a filter never gives a false negative, but it can give a false positive. Each sorted file carries its own bloom filter in memory.
Why It Helps
Before touching a file on disk, the engine checks the filter. A negative result lets it skip the file entirely. Only files that pass the filter are actually read, so most absent key lookups avoid disk work.
The Tradeoff
More bits per key lowers the false positive rate but uses more memory. Engines tune this to balance memory against wasted reads.
Key idea
A bloom filter cheaply rules out files that cannot hold a key, so an LSM lookup reads disk only for files that might match.