← Lessons

quiz vs the machine

Gold1520

Algorithms

Locality Sensitive Hashing

Hash similar items into the same bucket so near neighbor search avoids scanning everything.

6 min read · core · beat Gold to climb

Hashing that respects similarity

Ordinary hashing scatters even near identical inputs to unrelated buckets. Locality sensitive hashing, or LSH, deliberately does the opposite: it makes similar items likely to share a bucket and dissimilar items unlikely to.

The core property

A family of hash functions is locality sensitive if the probability of a collision grows with similarity. Two close points collide often, while far apart points collide rarely. The exact family depends on the distance measure, with different constructions for cosine, Jaccard, and Euclidean distances.

Amplifying the signal

A single function is too coarse. LSH combines functions in two stages.

  • AND: concatenate several hashes so only very similar items match, cutting false matches.
  • OR: use several such concatenated tables so true neighbors are caught in at least one.

This tuning sharpens the gap between near and far collision rates.

Searching

To find neighbors of a query, hash it into each table and inspect only the items in its buckets, then verify true distance on those few candidates. This replaces a full scan with examining a small candidate set, which is what makes approximate near neighbor search fast at scale.

Key idea

Locality sensitive hashing uses hash families where collision probability rises with similarity, amplified by combining tables, so near neighbor search inspects only a small candidate set of bucket mates instead of scanning everything.

Check yourself

Answer to earn rating on the learn ladder.

1. What property defines a locality sensitive hash family?

2. Why combine hashes with both AND and OR stages?