What it is
A suffix array is the list of all starting positions of a string's suffixes, sorted by the suffixes in dictionary order. It captures the same information as a suffix tree but in a compact array of integers.
Why sorted suffixes help
Because the suffixes are sorted, every occurrence of a pattern appears as a contiguous block in the array. To find a pattern you binary search for the block whose suffixes start with it. This turns substring search into a clean range lookup.
Building it efficiently
A naive build sorts the suffixes directly, which is slow because comparing two suffixes can scan many characters. The standard speedup sorts by doubling:
- First rank suffixes by their first character.
- Then by their first two characters, reusing the previous ranks.
- Then by four, then eight, each round combining two ranks already computed.
After a few doubling rounds the full order is fixed. A companion array of longest common prefixes between neighbors then unlocks many string queries, such as counting distinct substrings.
Key idea
Sort suffixes by doubling the compared length so any pattern's matches form one contiguous, binary searchable block.