← Lessons

quiz vs the machine

Gold1480

Algorithms

Count Min Sketch Deep Dive

A grid of counters and hashes estimates item frequencies in a stream with bounded overcount.

6 min read · core · beat Gold to climb

Counting in small space

A count min sketch estimates how often each item appears in a stream while using far less memory than an exact table. It is a grid of counters with one row per hash function and many columns.

Update and estimate

To record an item, hash it once per row, and for each row increment the counter in the chosen column. To estimate its count, hash it the same way and take the minimum of the counters it touches.

  • Collisions can only add to a counter, never subtract.
  • So each counter is an overestimate, and the minimum is the tightest of them.

Why the minimum

Different items that collide in one row likely do not collide in the others, so at least one of an item's counters tends to be only mildly inflated. Taking the minimum picks that cleanest estimate, bounding the overcount with high probability.

  • More columns reduce the overestimate size.
  • More rows reduce the chance the bound is exceeded.

Strengths and limits

The sketch never undercounts, which makes it ideal for finding heavy hitters. It cannot reliably report items with very small counts, since their estimates are dominated by collision noise.

Key idea

A count min sketch keeps a grid of counters, increments one per row by hashing, and estimates frequency as the minimum across rows, which only ever overcounts and tightly bounds the error for frequent items.

Check yourself

Answer to earn rating on the learn ladder.

1. How does a count min sketch estimate an item's frequency?

2. Which direction can the sketch's estimate err?