Beyond LIKE
A LIKE query with wildcards scans text and cannot rank results or understand word forms. Full text search treats text as a bag of words so you can find documents that contain given terms and order them by relevance.
How It Works
- Text is tokenized into individual words.
- A stemmer reduces words to a root, so running and runs both match run.
- Common stop words like the and is are dropped to save space.
- The result is stored in an inverted index that maps each term to the documents holding it.
Ranking
Search engines score documents so the best matches appear first. A common scheme weighs how often a term appears in a document against how rare the term is across all documents.
Tradeoffs
Full text indexes add write cost and storage, and stemming choices affect what matches. They shine for human language but are overkill for exact code or identifier lookups.
Key idea
Full text search tokenizes and stems text into an inverted index, enabling relevance ranked word and phrase matching that plain LIKE cannot offer.