← Lessons

quiz vs the machine

Gold1410

Algorithms

The Longest Common Subsequence Recurrence

Finding the longest order preserving match shared by two sequences.

6 min read · core · beat Gold to climb

A subsequence, not a substring

A subsequence keeps characters in their original order but allows gaps, unlike a substring which must be contiguous. The longest common subsequence, or LCS, of two strings is the longest sequence that appears in both, in order, possibly with gaps between picks. For a b c b d and b d c a b the LCS has length three.

The recurrence

Define the LCS length of the first i characters of A and the first j characters of B. Looking at the last characters:

  • If they are equal, that character extends a match, so the answer is one plus the LCS of the two shorter prefixes via the diagonal.
  • If they differ, the last characters cannot both be used, so take the better of two options: drop the last character of A, or drop the last character of B.

The base case is zero whenever either prefix is empty.

Building the answer

A table of length i by length j is filled in increasing order; each cell reads the diagonal, the left, and the up neighbour. The bottom right holds the LCS length, and the actual subsequence is recovered by walking backward through the choices. Work is proportional to the product of the two lengths.

Key idea

LCS fills a table where equal last characters add one to the diagonal and unequal ones take the larger of left or up; the corner gives the length and a backward walk recovers the shared subsequence.

Check yourself

Answer to earn rating on the learn ladder.

1. How does a subsequence differ from a substring?

2. When the last characters match, the LCS cell becomes what?

3. When the last characters differ, what is taken?