← Lessons

quiz vs the machine

Platinum1780

Algorithms

Sequence Alignment

Scoring matches, mismatches, and gaps to line up two sequences optimally.

7 min read · advanced · beat Platinum to climb

Lining up with scores instead of counts

Sequence alignment generalises edit distance and LCS. Instead of counting operations, it assigns a score to every column of an alignment: a reward for a match, a penalty for a mismatch, and a penalty for a gap where one sequence has a character and the other has a blank. The goal is the alignment with the best total score. This framing is central in computational biology for comparing DNA and proteins.

The dynamic program

Like edit distance, alignment fills a table over prefixes. Each cell chooses the best of three moves:

  • Align the two current characters, adding a match or mismatch score from the diagonal.
  • Gap in one sequence, coming from the left with a gap penalty.
  • Gap in the other sequence, coming from above with a gap penalty.

The global version, known as Needleman Wunsch, optimises over the whole strings. A local variant, Smith Waterman, adds a zero floor so the alignment can start fresh anywhere, finding the best matching region rather than forcing the ends to line up.

Gap penalties

Real biology often uses an affine gap model, charging more to open a gap than to extend one, since a single long insertion is more plausible than many scattered ones. This needs a slightly richer recurrence that tracks whether a gap is currently open.

Key idea

Sequence alignment scores matches, mismatches, and gaps and fills a prefix table choosing the best of align, gap left, or gap up; global alignment spans whole strings while a zero floor yields the best local region.

Check yourself

Answer to earn rating on the learn ladder.

1. How does alignment differ from plain edit distance?

2. What does the zero floor in local alignment achieve?

3. Why use an affine gap model?