The Longest Repeated Substring

The question

The longest repeated substring is the longest block of characters that occurs two or more times in a single string, possibly overlapping. Brute force comparison of all pairs of positions is far too slow on long inputs.

Suffixes hold the answer

A repeated substring is a common prefix of two different suffixes of the string. So if we sort all suffixes lexicographically, any repeat shows up as shared leading characters between neighboring suffixes in that order.

Using the prefix lengths

Build the suffix array, then the longest common prefix array that records how many leading characters each pair of adjacent sorted suffixes shares.

The largest value in that array is the length of the longest repeated substring.
Its position points back to where the repeat begins.

Why neighbors suffice

Sorting clusters similar suffixes together, so the deepest shared prefix must occur between two adjacent entries. Checking only neighbors, rather than all pairs, brings the cost down to roughly the sum governed by suffix array construction.

Key idea

The longest repeated substring is the deepest common prefix of two adjacent sorted suffixes, so a suffix array with its prefix length array finds it without comparing all position pairs.

The Longest Repeated Substring

The question

Suffixes hold the answer

Using the prefix lengths

Why neighbors suffice

Key idea

Check yourself