← Lessons

quiz vs the machine

Gold1420

Algorithms

Suffix Array Construction

Sorting every suffix of a string into a compact index that powers fast queries.

6 min read · core · beat Gold to climb

What a suffix array is

A suffix array lists the starting positions of all suffixes of a string, sorted in lexicographic order. It stores only integers, so it is far more compact than an explicit suffix tree while supporting many of the same queries.

The naive build

You could sort the suffixes directly by comparing them, but each comparison can scan many characters, making the naive build slow on long strings.

Prefix doubling

A common practical method is prefix doubling. We sort suffixes by their first character, then by their first two characters, then four, then eight, doubling the considered length each round.

  • Each round reuses ranks from the previous round as sort keys.
  • After a logarithmic number of rounds the full order is fixed.

This gives a construction cost of roughly the string length times a logarithmic factor, and specialized linear time algorithms also exist.

The companion array

Searches often pair the suffix array with the longest common prefix array, which records how many leading characters adjacent sorted suffixes share. Together they answer substring queries quickly.

Key idea

A suffix array stores all suffix start positions in sorted order, and prefix doubling reuses ranks each round to build it efficiently for compact, query ready string indexing.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a suffix array store?

2. How does prefix doubling speed up construction?