← Lessons

quiz vs the machine

Platinum1780

Machine Learning

Query Expansion

Enriching a short query so retrieval has more to match against.

5 min read · advanced · beat Platinum to climb

The short query problem

Real queries are often terse, ambiguous, or missing the exact words a document uses. Query expansion rewrites or enriches the query before retrieval so it overlaps better with relevant passages.

Common techniques

  • Synonym and term expansion: add related words so keyword search casts a wider net.
  • Rewriting: a language model rephrases a vague query into a clearer one.
  • Hypothetical document: generate a fake ideal answer, embed it, and search with that vector instead of the bare question.

Why the hypothetical answer trick works

A question and its answer often use different words, so embedding the question may sit far from the answer. Generating a plausible answer and searching with its embedding lands closer to real answer passages.

The risks

  • Drift: an expanded query can wander off topic and pull in noise.
  • Latency: generating a rewrite or hypothetical adds a model call before retrieval.

Expansion is a precision versus recall lever, useful when bare queries underretrieve.

Key idea

Query expansion enriches a terse query through synonyms, rewriting, or a hypothetical answer so it overlaps relevant passages, improving recall at the risk of drift and added latency.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does searching with a hypothetical answer often help?

2. What is a risk of query expansion?