← Lessons

quiz vs the machine

Gold1490

System Design

The Count Min Sketch

Estimating per item frequencies in a stream using a compact hashed counter grid.

5 min read · core · beat Gold to climb

The goal

You want to know how often each item appears in a stream, like requests per IP, but tracking an exact counter per item costs too much memory when there are millions of distinct keys. A count min sketch answers frequency queries approximately in fixed space.

How it works

  • Keep a grid of counters with several rows, each with its own hash function.
  • To add an item, hash it once per row and increment the cell each hash points to.
  • To query an item, read the cell in every row and return the minimum of those counters.

Why the minimum

Different items can collide and inflate a counter, so every cell is an over estimate. The smallest cell suffered the fewest collisions, so the minimum is the tightest upper bound. The sketch never underestimates.

More rows lower the chance of collision and more columns lower the error magnitude, so you tune both for your accuracy budget.

Key idea

A count min sketch estimates item frequencies in fixed memory by hashing into a counter grid and reading the minimum to bound error.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does a query return the minimum across rows?

2. What direction is the count min sketch error?