A self similarity measure
The prefix function of a string answers one question at every position: what is the length of the longest proper prefix that is also a suffix ending here. A proper prefix excludes the whole string itself. This value captures how much the beginning of the string reappears later inside it.
A small example
For the string a b a b a the prefix function at the final character is three, because the prefix a b a equals the suffix a b a. That overlap is exactly the kind of structure naive matching ignores.
Why it is useful
- It is computed once over the pattern before any searching.
- It tells you, after a partial match fails, how far the matched portion already repeats.
- That repetition lets a matcher resume instead of restarting from scratch.
The key insight is that if you already matched several characters and then hit a mismatch, some suffix of what you matched might equal a prefix of the pattern. The prefix function tells you precisely how long that reusable overlap is, so progress is never thrown away entirely.
Key idea
The prefix function records, for each position, the longest proper prefix that is also a suffix; this self overlap is the precomputed knowledge that lets matching skip redundant comparisons.