Matching meaning, not words
Lexical search misses a result that means the same thing with different words. Vector search embeds both the query and each document into a high dimensional vector with a model, so semantically similar items sit close together. Retrieval becomes finding the nearest neighbors of the query vector.
Why approximate search
Exact nearest neighbor over millions of vectors means comparing the query to every vector, which is too slow. Engines use approximate nearest neighbor indexes that trade a little accuracy for huge speed:
- Graph indexes like hierarchical navigable small world link vectors and walk toward the query.
- Quantization compresses vectors so more fit in memory and comparisons are cheaper.
The key knob is recall versus latency, since approximation may miss a true neighbor.
What it does not give you
Vector search captures topical similarity but can be weak on exact terms like a part number or a rare name, where lexical matching shines. It also depends entirely on the embedding model, so changing models means re embedding the corpus.
Key idea
Vector search embeds queries and documents into a shared space and uses approximate nearest neighbor indexes to retrieve semantically similar results, trading some recall for speed.