← Lessons

quiz vs the machine

Silver1080

Algorithms

String Hashing for Substring Comparison

Turn substrings into numbers so equality checks become quick comparisons.

4 min read · intro · beat Silver to climb

The idea of a rolling hash

Comparing two substrings character by character is slow when you do it many times. String hashing maps each substring to a single number so that equal substrings map to equal numbers. We treat a string as a number in some base, like a positional numeral system.

Polynomial hashing

Pick a base and a large prime modulus. The hash of a string is the sum of each character value times a power of the base, all taken modulo the prime:

  • The leftmost character gets the highest power.
  • Each following character gets the next lower power.

Precompute a prefix hash of the whole text. Then the hash of any substring can be read off in constant time using a subtraction and a stored power of the base. This is the rolling part: a window can slide and update in one step.

Collisions and safety

Two different substrings can share a hash, called a collision. To make this rare, use a large prime and sometimes two independent hashes at once. A matching hash then signals a very likely match, which you can confirm directly if correctness must be exact.

Key idea

Encode substrings as polynomial numbers under a prime modulus so any two can be compared by a single number lookup.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a precomputed prefix hash let you do quickly?

2. Why use a large prime modulus and sometimes two hashes?

3. How can you guarantee a hash match is a real match?