The task
Spell correction detects misspelled words and proposes fixes. It powers search boxes, keyboards, and text cleanup before downstream NLP.
Two kinds of errors
- Non word errors produce a string that is not in the dictionary, like teh for the. These are easy to flag.
- Real word errors produce a valid but wrong word, like form for from. These need context to catch.
The noisy channel view
A clean correction balances two signals.
- An error model says how likely a typo is, since nearby keys and single letter swaps are common.
- A language model says how likely the candidate word is in this context.
- The best correction maximizes the product of the two.
So three becomes the over there only when context favors it.
Generating candidates
Generate candidate words within a small edit distance of the typo, then score each with the error and language models and pick the best. Limiting edit distance keeps the candidate set small and fast.
Key idea
Spell correction generates candidates within a small edit distance and ranks them by a noisy channel that multiplies an error model by a context language model, which is what catches real word errors.