Raw scores are rarely enough
A pure text score from BM25 ignores context a product owner cares about, like recency, popularity, or matching the title rather than the body. Relevance tuning adjusts the final order to reflect these signals.
Common levers
- Field boosts weight a match in an important field, so a title hit counts more than a body hit.
- Query boosts raise the contribution of one clause inside a larger boolean query.
- Function scoring multiplies or adds a computed value such as a freshness decay or a popularity factor.
Combining a text score with these signals must be deliberate. A simple multiply lets popularity overwhelm relevance if not bounded, so engines often dampen signals with logarithms or saturation curves.
Measuring, not guessing
Tuning by intuition drifts. Teams define judgment lists of queries with rated results and track offline metrics like normalized discounted cumulative gain. Changes ship behind online experiments so a boost that looks good offline must also win on real clicks before it stays.
Key idea
Relevance tuning blends the text score with field boosts and business signals, dampened to stay balanced and validated by judgment lists and online experiments.