← Lessons

quiz vs the machine

Gold1440

System Design

Fuzzy Search and Edit Distance

Matching queries despite typos by allowing a few character changes.

5 min read · core · beat Gold to climb

Tolerating mistakes

Users misspell words. A fuzzy match still finds the intended term by allowing a small number of differences. The standard measure is edit distance, the minimum count of single character insertions, deletions, or substitutions to turn one word into another. With transpositions added it becomes Damerau edit distance.

Why brute force fails

Comparing a query word against every term in a huge vocabulary is too slow. Two ideas make fuzzy search practical.

  • Bounding limits the allowed distance, usually one or two edits, which prunes most candidates immediately.
  • A finite state automaton built for the query accepts exactly the strings within that distance, so the engine walks the index and the automaton together to enumerate matches efficiently.

Cost and ranking

Fuzzy matching widens recall but can pull in unrelated words, so engines often rank exact matches above edited ones and cap the edit distance based on word length. Very short words get fewer allowed edits to avoid noise.

Key idea

Fuzzy search matches near misses by bounding edit distance and using an automaton so typos still find the right terms without scanning the whole vocabulary.

Check yourself

Answer to earn rating on the learn ladder.

1. What does edit distance count?

2. Why bound the allowed edit distance?