Meaning as distance
When text or images become embeddings, similar items end up near each other. Similarity search finds the vectors closest to a query vector, which powers semantic search and recommendations.
Measuring closeness
The most common measure is cosine similarity, which compares the angle between two vectors and ignores their length. Squared Euclidean distance is also used. For normalized vectors these two rank results the same way.
Searching at scale
Comparing a query against millions of vectors one by one is slow. Real systems use approximate nearest neighbor indexes that trade a little accuracy for huge speed.
- Methods like HNSW build a navigable graph of vectors
- Others cluster vectors and search only nearby clusters
- These return the top results in milliseconds
The cost is that you might occasionally miss a true nearest neighbor, but the speed gain makes large scale search practical.
Key idea
Similarity search ranks embeddings by closeness, using approximate nearest neighbor indexes to stay fast over millions of vectors.