Finding the frequent few
The heavy hitters problem asks which items appear more than some fraction of a stream. The Misra Gries algorithm tracks candidates using only a fixed number of counters, far fewer than the number of distinct items.
The counter rules
Maintain at most a chosen number of item to counter pairs. For each arriving item:
- If it already has a counter, increment it.
- Else if a counter slot is free, start a new counter at one.
- Else decrement every counter, removing any that hit zero.
The decrement step is the clever part. It throws away one occurrence of many items at once, which only the truly frequent items can survive.
What the counts mean
The stored counts are underestimates of true frequencies, but they never undercount by more than a bounded amount tied to the stream length and counter budget. Items that are genuinely frequent are guaranteed to remain in the table.
- A second pass can verify exact counts if needed.
- No false negatives among true heavy hitters.
Key idea
Misra Gries keeps a small fixed set of counters and decrements them all when full, which only frequent items survive, giving bounded underestimates that are guaranteed to retain every true heavy hitter.