When several answers are right
Many retrieval tasks have multiple relevant documents per query. MAP, mean average precision, rewards ranking all of them high, not just the first.
Average precision per query
For a single query, walk down the ranked list.
- Each time you hit a relevant item, record the precision at that position
- Average those precision values over all relevant items for the query
- That average is the average precision for the query
This naturally rewards clustering relevant items near the top, because early hits have high precision.
From AP to MAP
MAP is simply the mean of average precision across all queries. It gives one number summarizing retrieval quality over a whole benchmark.
Compared to alternatives
- Unlike MRR, MAP accounts for every relevant item, not just the first
- Unlike NDCG, MAP assumes binary relevance, relevant or not, with no graded levels
- It is a long standing standard in information retrieval evaluation
Use MAP when relevance is binary but multiple documents matter and their positions should all count.
Key idea
MAP averages precision at each relevant hit per query, then averages over queries. It scores multi answer binary retrieval where every relevant item should rank high.