← Lessons

quiz vs the machine

Platinum1850

Algorithms

The Aho Corasick Automaton

Match many patterns at once with a trie plus fallback links.

6 min read · advanced · beat Platinum to climb

Matching a dictionary

You have a whole dictionary of patterns and one long text, and you want every occurrence of every pattern. Running a single pattern matcher once per pattern is wasteful. The Aho Corasick automaton scans the text once and reports them all.

Trie plus failure links

First build a trie of all the patterns, so shared prefixes share a path. Then add a failure link from each node to the longest proper suffix of the matched text that is also a prefix somewhere in the trie.

  • While scanning the text, follow the trie edge for the next character if one exists.
  • If no edge exists, follow failure links until an edge does exist, or you fall back to the root.

A failure link is exactly what lets you avoid restarting after a partial match breaks, by jumping to the longest still valid suffix.

Reporting matches

Each node also carries an output link chain pointing to any patterns that end at a suffix of the current path, so a single position can report several matches. The whole scan is linear in the text length plus the number of matches.

Key idea

Combine a trie of all patterns with failure links so one pass over the text finds every match without restarting.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the role of a failure link?

2. Why build a trie of all patterns together?

3. What lets one position report several matches?