The problem
Counting distinct items, like unique visitors, exactly needs a set holding every value seen, which is huge for billions of items. HyperLogLog estimates the count in kilobytes with a small, bounded error.
The intuition
- Hash each item to a uniform bit string.
- Track the position of the leftmost run of leading zeros seen. A long run is rare, so seeing one hints that many distinct items passed through.
- Split the hash into registers and average their estimates with a harmonic mean to cut variance.
What you get
- Memory is fixed regardless of how many items arrive.
- Typical error is around a couple of percent, tunable by register count.
- Sketches are mergeable: combine per shard counts by taking the max of each register.
For dashboards that ask how many unique users, an exact answer is overkill and HyperLogLog is the practical choice.
Key idea
HyperLogLog estimates distinct counts in fixed tiny memory by tracking leading zero runs across mergeable registers.