← Lessons

quiz vs the machine

Platinum1780

System Design

Top K Heavy Hitters

Finding the most frequent items in a stream without counting every key exactly.

6 min read · advanced · beat Platinum to climb

The question

Heavy hitters are the few items that account for most of the volume, like the top trending searches. Finding the top k exactly would mean an exact counter per distinct key, which is infeasible on a high rate stream.

A practical approach

  • Use a count min sketch to estimate each item frequency in fixed memory.
  • Keep a small min heap of size k holding the current top candidates and their estimated counts.
  • On each event, update the sketch, then compare the item estimate against the heap minimum and swap if it is larger.

Why approximation is acceptable

The heavy hitters are by definition frequent, so their estimates are tight and they survive the threshold reliably. Rare items, which dominate the distinct key count, never crowd memory because they are not tracked individually.

Watch outs

  • Concept drift means yesterday top items fade, so combine with decay or windowed counts.
  • Merging per shard heaps needs care because the same item may sit in several shards.

This sketch plus heap pattern is the standard way trending and abuse detection systems surface the loud minority cheaply.

Key idea

Top k heavy hitters combine a frequency sketch with a small heap to surface the loudest items without counting every key exactly.

Check yourself

Answer to earn rating on the learn ladder.

1. What two structures commonly combine to find top k heavy hitters?

2. Why is approximation acceptable for heavy hitters?

3. What complicates merging top k results across shards?