← Lessons

quiz vs the machine

Gold1440

Algorithms

The Edit Distance Recurrence

Counting the minimum insert, delete, and replace operations between two strings.

6 min read · core · beat Gold to climb

How different are two strings

The edit distance, also called Levenshtein distance, is the minimum number of single character operations needed to turn one string into another. The allowed operations are insert, delete, and replace.

The recurrence

Let the two strings be A and B. Define the distance between the first i characters of A and the first j characters of B. The recurrence considers the last characters:

  • If the last characters are equal, no operation is needed there, so the cost equals the distance of the shorter prefixes.
  • If they differ, take the cheapest of three choices and add one: a replace uses both shorter prefixes, a delete drops the last character of A, and an insert adds the last character of B.

The base cases are simple: turning a string into the empty string costs one operation per character.

Filling the table

A table indexed by i and j is filled row by row, each cell depending only on the cell above, the cell to the left, and the diagonal one. The bottom right corner holds the answer. The work is proportional to the product of the two lengths, and only two rows are needed at a time if you want to save memory.

Key idea

Edit distance is filled with a table where each cell reuses the diagonal when characters match and otherwise adds one to the cheapest of replace, delete, and insert; the corner cell gives the minimum operations.

Check yourself

Answer to earn rating on the learn ladder.

1. Which three operations does Levenshtein edit distance allow?

2. When the last characters of the two prefixes are equal, the cell value is what?

3. What does the bottom right cell of the table represent?