The ranker's job
After candidate generation, the ranking model scores each candidate precisely to set the final order. Its accuracy lives or dies on feature quality, so feature design is the heart of the work.
Feature families
- User features: long term taste embeddings, demographics, recent activity counts.
- Item features: category, age, popularity, content embeddings.
- User item cross features: past interactions with this item's category or author.
- Context features: device, time of day, query, position in the feed.
The cross features often matter most, since relevance is fundamentally about the match between this user and this item.
Handling each type
- Bucketize or embed sparse categorical features.
- Normalize or log transform skewed counts.
- Track freshness features so new items are not buried.
Pitfalls
- A leaky feature that encodes the label inflates offline metrics and collapses online.
- Features available offline but not at serving time cause training serving skew.
Key idea
A ranking model's quality is driven by its features, especially user item cross features, while leakage and training serving skew are the failure modes that quietly destroy real world performance.