← Lessons

quiz vs the machine

Gold1410

Concurrency

The Bloom Filter For Membership

A compact probabilistic set that answers membership with no false negatives.

5 min read · core · beat Gold to climb

A tiny set with a tradeoff

A Bloom filter is a compact structure that answers the question is this item in the set. It uses far less memory than storing the items, at the cost of occasional false positives. Crucially, it never gives a false negative, so if it says not present, that is certain.

How it works

The filter is a bit array plus several hash functions. To add an item, hash it with each function and set the bits at those positions. To test an item, hash it and check those bits.

  • If any of the bits is zero, the item is definitely not in the set.
  • If all the bits are one, the item is probably in the set.
  • Bits set by other items can cause a false positive.

Where it helps coordination

In a cluster, nodes exchange Bloom filters to summarize which keys or members they hold without shipping the full list. A node can cheaply ask might you have this before doing an expensive lookup, skipping work when the filter says no.

Key idea

A Bloom filter answers set membership in tiny space with possible false positives but never false negatives, letting cluster nodes cheaply summarize what they hold and skip lookups when the answer is definitely no.

Check yourself

Answer to earn rating on the learn ladder.

1. What guarantee does a Bloom filter give?

2. When is an item definitely not in the set?

3. Why are Bloom filters useful between cluster nodes?