The brief
Recommend items a user is likely to engage with, from a catalog of millions, within a tight latency budget.
Two stage architecture
You cannot score millions of items per request, so split the work.
- Candidate generation cheaply narrow millions to a few hundred, often via embedding nearest neighbor search
- Ranking apply a richer model to score and order that short list
Data and metrics
- Labels clicks and conversions, with exploration to fight feedback loops
- Offline ranking metrics like NDCG on logged data
- Online AB test on engagement, with guardrails on long term retention
Serving choices
- Precompute item embeddings in batch, refresh user embeddings in near real time
- Use an approximate nearest neighbor index for fast candidate lookup
- Add a fallback to popular items when the model or features fail
Operations
Monitor drift and engagement, retrain as tastes shift, and watch for popularity feedback loops that narrow the catalog.
Key idea
A production recommender is a two stage funnel: cheap candidate generation then rich ranking, glued together by feature stores, exploration, AB tests, and fallbacks.