← Lessons

quiz vs the machine

Gold1430

Machine Learning

Embedding Similarity Search

Finding the closest vectors fast, the engine behind semantic search.

5 min read · core · beat Gold to climb

Meaning as distance

When text or images become embeddings, similar items end up near each other. Similarity search finds the vectors closest to a query vector, which powers semantic search and recommendations.

Measuring closeness

The most common measure is cosine similarity, which compares the angle between two vectors and ignores their length. Squared Euclidean distance is also used. For normalized vectors these two rank results the same way.

Searching at scale

Comparing a query against millions of vectors one by one is slow. Real systems use approximate nearest neighbor indexes that trade a little accuracy for huge speed.

  • Methods like HNSW build a navigable graph of vectors
  • Others cluster vectors and search only nearby clusters
  • These return the top results in milliseconds

The cost is that you might occasionally miss a true nearest neighbor, but the speed gain makes large scale search practical.

Key idea

Similarity search ranks embeddings by closeness, using approximate nearest neighbor indexes to stay fast over millions of vectors.

Check yourself

Answer to earn rating on the learn ladder.

1. What does cosine similarity compare?

2. Why use approximate nearest neighbor search?