What we are solving
A recommender system tries to surface items a user is likely to engage with from a catalog far too large to browse. Netflix has tens of thousands of titles; a store may have millions of products. The job is to rank the catalog for each person at each moment.
Why it is hard
- The matrix of users versus items is massive and mostly empty. Most users touch a tiny slice of the catalog.
- Signals are noisy and indirect. A click is not love; an absence is not hate.
- Tastes drift over time and depend on context like device or time of day.
The standard framing
We have interactions between users and items. We want a scoring function that estimates how relevant an item is to a user, then we sort by that score. Quality is judged with ranking metrics such as precision at k, recall at k, and normalized discounted cumulative gain.
The two big families are content based methods that use item attributes and collaborative methods that use the crowd of other users.
Key idea
Recommendation is ranking the catalog per user from sparse, noisy, drifting feedback, judged by how well the top results match real interest.