The exact search wall
Finding the true nearest vector to a query is easy in theory, just compare the query to every stored vector and keep the closest. This is exact search, but it costs one comparison per item. With millions of vectors and many queries per second, scanning everything becomes too slow.
The approximate bargain
Approximate nearest neighbor, or ANN, trades a little accuracy for a large speed gain. Instead of guaranteeing the true closest match, it returns vectors that are almost certainly among the closest, while touching only a small fraction of the data.
Why a near miss is acceptable
- In semantic search the top results are already similar, so a slightly different ordering rarely hurts the user.
- The speedup can be a hundred times or more, which makes large scale search feasible at all.
How quality is judged
We measure ANN with recall, the fraction of true nearest neighbors that the approximate method actually found. The art of ANN is reaching high recall while keeping the search fast.
Key idea
Exact nearest neighbor search scales poorly, so ANN accepts a small recall loss in exchange for searching only a fraction of the data at far higher speed.