Answering Might It Be Here
A bloom filter is a small probabilistic structure that answers one question fast: is this key possibly in the set, or definitely not. It can say definitely not with certainty, and possibly yes with a small false positive chance, but it never gives a false negative.
How It Works
- A bit array starts all zeros.
- On insert, several hash functions map the key to several bit positions, which are set to one.
- On query, the same positions are checked. If any is zero the key is definitely absent. If all are one it is probably present.
Why It Helps Caches and Stores
Before doing an expensive disk read or origin lookup, the system checks the bloom filter. A definite no lets it skip the lookup entirely, which is exactly how an LSM tree avoids reading SSTables that cannot hold a key and how a cache rejects obviously absent keys. The price is a small false positive rate, tunable by sizing the bit array and the number of hashes.
Key idea
A bloom filter is a small structure that says a key is definitely absent or possibly present with no false negatives, letting a cache or store skip expensive lookups for keys that cannot be there.