← Lessons

quiz vs the machine

Silver1130

Machine Learning

The Approximate Nearest Neighbor Problem

Why exact nearest neighbor search does not scale, and the bargain we strike.

4 min read · intro · beat Silver to climb

The exact search wall

Finding the true nearest vector to a query is easy in theory, just compare the query to every stored vector and keep the closest. This is exact search, but it costs one comparison per item. With millions of vectors and many queries per second, scanning everything becomes too slow.

The approximate bargain

Approximate nearest neighbor, or ANN, trades a little accuracy for a large speed gain. Instead of guaranteeing the true closest match, it returns vectors that are almost certainly among the closest, while touching only a small fraction of the data.

Why a near miss is acceptable

  • In semantic search the top results are already similar, so a slightly different ordering rarely hurts the user.
  • The speedup can be a hundred times or more, which makes large scale search feasible at all.

How quality is judged

We measure ANN with recall, the fraction of true nearest neighbors that the approximate method actually found. The art of ANN is reaching high recall while keeping the search fast.

Key idea

Exact nearest neighbor search scales poorly, so ANN accepts a small recall loss in exchange for searching only a fraction of the data at far higher speed.

Check yourself

Answer to earn rating on the learn ladder.

1. What does approximate nearest neighbor search trade away?

2. How is the quality of an ANN method measured?