Double Hashing to Avoid Collisions

Pairing two independent hashes to make accidental string matches astronomically rare.

Two fingerprints are safer than one

A single rolling hash can collide: two different substrings sometimes produce the same fingerprint. When matching is run at scale, or when verifying every hit is too expensive, those collisions become a real concern. Double hashing addresses this by computing two independent hashes for each window using different bases and different moduli.

Why two helps so much

Two substrings are treated as equal only when both fingerprints agree. If each hash collides with some small probability, the chance that two unrelated strings collide on both at once is roughly the product of those probabilities. Multiplying two already small numbers gives a far smaller one, pushing accidental matches to negligible levels.

Practical notes

Choose large prime moduli so the hash values spread evenly.
Use distinct bases so the two hashes are not correlated.
Combine the pair into one comparable key, often by treating it as a coordinate pair.

With double hashing, many implementations skip the direct verification entirely and trust the pair, accepting a vanishingly small error probability in exchange for speed.

Key idea

Double hashing combines two independent rolling hashes and accepts a match only when both agree; because the collision probabilities multiply, false matches become astronomically unlikely.

Double Hashing to Avoid Collisions

Two fingerprints are safer than one

Why two helps so much

Practical notes

Key idea

Check yourself