A compact probabilistic set
A Bloom filter answers membership questions using a bit array and several independent hash functions. It can say an item is definitely not present or possibly present, trading a small false positive rate for tiny memory.
Insert and query
To insert an element, hash it with each function, map each hash to a bit position, and set those bits to one. To query, hash the same way and check those bits.
- If any bit is zero, the item is certainly absent.
- If all bits are one, the item is probably present.
False positives happen when other insertions happen to set all the queried bits.
Tuning the parameters
The false positive rate depends on the number of bits, the number of items, and the number of hash functions. For a target rate there is an optimal hash count that balances filling the array too fast against checking too few positions.
- More bits per item lowers the error.
- Too many hashes saturate the array and raise errors.
What it cannot do
A standard Bloom filter has no deletion, since clearing a bit could erase other elements. Counting variants store small counters instead of single bits to allow removal.
Key idea
A Bloom filter sets bits chosen by several hashes per item, so a zero bit proves absence and all ones suggests presence, giving compact membership with tunable false positives but no deletion in the basic form.