← Lessons

quiz vs the machine

Gold1480

Algorithms

Longest Common Substring With Suffix

Finding the longest stretch shared by two strings using suffix machinery, not slow tables.

5 min read · core · beat Gold to climb

The problem

Given two strings, the longest common substring is the longest contiguous block that appears in both. Note this differs from the longest common subsequence, which allows gaps.

The combined string trick

A classic approach concatenates both strings with a unique separator between them, then builds a suffix structure such as a suffix array with its longest common prefix array, or a generalized suffix tree.

  • The separator prevents matches that straddle the two strings.
  • We tag each suffix by which original string it came from.

Reading off the answer

In the sorted suffix order, scan adjacent suffixes. Whenever two neighbors come from different source strings, their longest common prefix is a candidate common substring. The maximum such prefix over all valid neighbor pairs is the answer.

Why structures beat the table

The straightforward dynamic programming table costs the product of the two lengths in time and space. Suffix based methods run closer to the sum of the lengths, a big win on long inputs.

Key idea

Joining two strings with a separator and scanning adjacent suffixes from different sources finds their longest common substring in time near the combined length, beating the product sized table.

Check yourself

Answer to earn rating on the learn ladder.

1. How does the longest common substring differ from longest common subsequence?

2. Why is a unique separator inserted between the two strings?