Case Study Recommendation System

The brief

Recommend items a user is likely to engage with, from a catalog of millions, within a tight latency budget.

Two stage architecture

You cannot score millions of items per request, so split the work.

Candidate generation cheaply narrow millions to a few hundred, often via embedding nearest neighbor search
Ranking apply a richer model to score and order that short list

Data and metrics

Labels clicks and conversions, with exploration to fight feedback loops
Offline ranking metrics like NDCG on logged data
Online AB test on engagement, with guardrails on long term retention

Serving choices

Precompute item embeddings in batch, refresh user embeddings in near real time
Use an approximate nearest neighbor index for fast candidate lookup
Add a fallback to popular items when the model or features fail

Operations

Monitor drift and engagement, retrain as tastes shift, and watch for popularity feedback loops that narrow the catalog.

Key idea

A production recommender is a two stage funnel: cheap candidate generation then rich ranking, glued together by feature stores, exploration, AB tests, and fallbacks.