← Lessons

quiz vs the machine

Gold1370

Machine Learning

Metadata Filtering in Vector Search

Combining semantic nearness with hard constraints like date, source, or tenant.

5 min read · core · beat Gold to climb

Beyond pure similarity

Sometimes nearness alone is not enough. You may need results only from a certain customer, after a certain date, or in a certain language. Metadata filtering attaches structured fields to each vector and restricts search to those that match.

Two ways to combine

  • Pre filtering: first apply the filter, then search only the matching subset.
  • Post filtering: search by similarity first, then drop results that fail the filter.

The hidden tradeoff

Post filtering is simple but can return too few results if many top matches fail the filter. Pre filtering guarantees enough matches but is harder to combine with a graph index, since the index does not know your filter ahead of time.

Why it matters

  • Security: a tenant must never see another tenant data, so filtering by tenant is mandatory.
  • Relevance: recent or in language results often matter more than the absolute nearest.

Key idea

Metadata filtering pairs semantic nearness with structured constraints, and the choice of pre or post filtering trades implementation simplicity against guaranteed result counts and security.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the risk of post filtering?

2. Why is metadata filtering often mandatory for security?