The idea
Content based filtering builds a profile from the attributes of items a user engaged with, then recommends other items whose attributes match. If you watched several space documentaries, it scores other documentaries about space highly.
How it works
- Describe each item with a feature vector. For text this might be TF IDF over words; for movies it could be genre, cast, and tags.
- Build a user profile by aggregating the vectors of items they liked, often a weighted average.
- Score a candidate item by its similarity to the user profile, commonly cosine similarity.
Strengths
- Works for a new item the moment it has attributes, even with zero interactions.
- Recommendations are explainable: because you liked X which is about space.
- No dependency on other users, so it works even with a small user base.
Weaknesses
- It tends to stay inside a narrow lane and rarely surprises the user.
- It needs good attributes, which are expensive to curate.
Key idea
Content based filtering matches items to a user profile built from item attributes, giving explainable picks but limited novelty.