← Lessons

quiz vs the machine

Platinum1760

Algorithms

The Longest Repeated Substring

Spotting the longest stretch that appears at least twice using sorted suffixes.

5 min read · advanced · beat Platinum to climb

The question

The longest repeated substring is the longest block of characters that occurs two or more times in a single string, possibly overlapping. Brute force comparison of all pairs of positions is far too slow on long inputs.

Suffixes hold the answer

A repeated substring is a common prefix of two different suffixes of the string. So if we sort all suffixes lexicographically, any repeat shows up as shared leading characters between neighboring suffixes in that order.

Using the prefix lengths

Build the suffix array, then the longest common prefix array that records how many leading characters each pair of adjacent sorted suffixes shares.

  • The largest value in that array is the length of the longest repeated substring.
  • Its position points back to where the repeat begins.

Why neighbors suffice

Sorting clusters similar suffixes together, so the deepest shared prefix must occur between two adjacent entries. Checking only neighbors, rather than all pairs, brings the cost down to roughly the sum governed by suffix array construction.

Key idea

The longest repeated substring is the deepest common prefix of two adjacent sorted suffixes, so a suffix array with its prefix length array finds it without comparing all position pairs.

Check yourself

Answer to earn rating on the learn ladder.

1. A repeated substring corresponds to what relationship between suffixes?

2. Where does the answer appear after sorting suffixes?