Tolerating mistakes
Users misspell words. A fuzzy match still finds the intended term by allowing a small number of differences. The standard measure is edit distance, the minimum count of single character insertions, deletions, or substitutions to turn one word into another. With transpositions added it becomes Damerau edit distance.
Why brute force fails
Comparing a query word against every term in a huge vocabulary is too slow. Two ideas make fuzzy search practical.
- Bounding limits the allowed distance, usually one or two edits, which prunes most candidates immediately.
- A finite state automaton built for the query accepts exactly the strings within that distance, so the engine walks the index and the automaton together to enumerate matches efficiently.
Cost and ranking
Fuzzy matching widens recall but can pull in unrelated words, so engines often rank exact matches above edited ones and cap the edit distance based on word length. Very short words get fewer allowed edits to avoid noise.
Key idea
Fuzzy search matches near misses by bounding edit distance and using an automaton so typos still find the right terms without scanning the whole vocabulary.