← Lessons

quiz vs the machine

Silver1120

System Design

The Bloom Filter for Cache Misses

A tiny probabilistic set tells you when a key is definitely absent so you skip the slow lookup.

4 min read · intro · beat Silver to climb

Answering Might It Be Here

A bloom filter is a small probabilistic structure that answers one question fast: is this key possibly in the set, or definitely not. It can say definitely not with certainty, and possibly yes with a small false positive chance, but it never gives a false negative.

How It Works

  • A bit array starts all zeros.
  • On insert, several hash functions map the key to several bit positions, which are set to one.
  • On query, the same positions are checked. If any is zero the key is definitely absent. If all are one it is probably present.

Why It Helps Caches and Stores

Before doing an expensive disk read or origin lookup, the system checks the bloom filter. A definite no lets it skip the lookup entirely, which is exactly how an LSM tree avoids reading SSTables that cannot hold a key and how a cache rejects obviously absent keys. The price is a small false positive rate, tunable by sizing the bit array and the number of hashes.

Key idea

A bloom filter is a small structure that says a key is definitely absent or possibly present with no false negatives, letting a cache or store skip expensive lookups for keys that cannot be there.

Check yourself

Answer to earn rating on the learn ladder.

1. What can a bloom filter never produce?

2. How does a bloom filter speed up cache and store lookups?