The idea of a rolling hash
Comparing two substrings character by character is slow when you do it many times. String hashing maps each substring to a single number so that equal substrings map to equal numbers. We treat a string as a number in some base, like a positional numeral system.
Polynomial hashing
Pick a base and a large prime modulus. The hash of a string is the sum of each character value times a power of the base, all taken modulo the prime:
- The leftmost character gets the highest power.
- Each following character gets the next lower power.
Precompute a prefix hash of the whole text. Then the hash of any substring can be read off in constant time using a subtraction and a stored power of the base. This is the rolling part: a window can slide and update in one step.
Collisions and safety
Two different substrings can share a hash, called a collision. To make this rare, use a large prime and sometimes two independent hashes at once. A matching hash then signals a very likely match, which you can confirm directly if correctness must be exact.
Key idea
Encode substrings as polynomial numbers under a prime modulus so any two can be compared by a single number lookup.