The question
Heavy hitters are the few items that account for most of the volume, like the top trending searches. Finding the top k exactly would mean an exact counter per distinct key, which is infeasible on a high rate stream.
A practical approach
- Use a count min sketch to estimate each item frequency in fixed memory.
- Keep a small min heap of size k holding the current top candidates and their estimated counts.
- On each event, update the sketch, then compare the item estimate against the heap minimum and swap if it is larger.
Why approximation is acceptable
The heavy hitters are by definition frequent, so their estimates are tight and they survive the threshold reliably. Rare items, which dominate the distinct key count, never crowd memory because they are not tracked individually.
Watch outs
- Concept drift means yesterday top items fade, so combine with decay or windowed counts.
- Merging per shard heaps needs care because the same item may sit in several shards.
This sketch plus heap pattern is the standard way trending and abuse detection systems surface the loud minority cheaply.
Key idea
Top k heavy hitters combine a frequency sketch with a small heap to surface the loudest items without counting every key exactly.