Beyond LIKE matching
A LIKE pattern is fine for substrings but poor at natural language; it cannot rank results, handle word variants, or ignore noise words. Full text search is built for documents, matching by meaningful words rather than raw characters.
How text becomes searchable
The engine transforms text in stages:
- Tokenization splits text into words.
- Normalization lowercases and strips punctuation.
- Stemming reduces words to a root, so running and ran can match run.
- Stop word removal drops common words like the and and to save space.
The result is a list of normalized terms stored in an inverted index, which maps each term to the documents containing it. This makes word lookups fast even across millions of rows.
Querying and ranking
A full text query is parsed into terms and combined with operators for and, or, and phrase matching. Crucially, the engine computes a relevance score, often based on term frequency, so you can ORDER BY rank and return the best matches first.
Engine specifics
Postgres uses tsvector and tsquery with a GIN index. MySQL offers FULLTEXT indexes with natural language and boolean modes. The concepts of tokenize, stem, index, and rank carry across them.
Key idea
Full text search tokenizes, stems, and indexes text into an inverted index of terms, enabling fast word aware matching with relevance ranking that plain LIKE cannot provide.