← Lessons

quiz vs the machine

Platinum1820

System Design

Semantic Search with Embeddings

Matching meaning, not just words, by representing text as vectors.

5 min read · advanced · beat Platinum to climb

Beyond keywords

Keyword search misses results that mean the same thing in different words. Semantic search represents text as embeddings, vectors where similar meanings sit close together.

How it works

  • An encoder model turns each document into a vector at index time.
  • The same encoder turns the query into a vector at query time.
  • Retrieval finds documents whose vectors are nearest to the query vector.

Approximate nearest neighbor

Exact nearest neighbor over millions of vectors is too slow, so systems use an approximate nearest neighbor index. It trades a tiny amount of accuracy for a huge speedup by searching only promising regions of the vector space.

Trade offs

  • Strength is recall on paraphrases and concepts keyword search misses.
  • Weakness is exact term matching, like specific codes or names, where keywords excel.

Because of this, embeddings are usually combined with keyword retrieval rather than replacing it.

Diagram

Key idea

Semantic search encodes text as vectors and retrieves nearest neighbors approximately, capturing meaning that keyword matching misses.

Check yourself

Answer to earn rating on the learn ladder.

1. What does an embedding represent?

2. Why use approximate nearest neighbor instead of exact search?

3. Where does keyword search still beat semantic search?