← Lessons

quiz vs the machine

Gold1310

Machine Learning

The Spell Correction NLP

Detecting and fixing misspellings using error and context models.

4 min read · core · beat Gold to climb

The task

Spell correction detects misspelled words and proposes fixes. It powers search boxes, keyboards, and text cleanup before downstream NLP.

Two kinds of errors

  • Non word errors produce a string that is not in the dictionary, like teh for the. These are easy to flag.
  • Real word errors produce a valid but wrong word, like form for from. These need context to catch.

The noisy channel view

A clean correction balances two signals.

  • An error model says how likely a typo is, since nearby keys and single letter swaps are common.
  • A language model says how likely the candidate word is in this context.
  • The best correction maximizes the product of the two.

So three becomes the over there only when context favors it.

Generating candidates

Generate candidate words within a small edit distance of the typo, then score each with the error and language models and pick the best. Limiting edit distance keeps the candidate set small and fast.

Key idea

Spell correction generates candidates within a small edit distance and ranks them by a noisy channel that multiplies an error model by a context language model, which is what catches real word errors.

Check yourself

Answer to earn rating on the learn ladder.

1. What kind of error is form used for from?

2. In the noisy channel model, the best correction maximizes what?