From neighbors to latent factors
Neighborhood methods compare rows or columns directly. Matrix factorization instead learns a compact representation: every user gets a vector and every item gets a vector in the same low dimensional space. The predicted rating is their dot product.
The model
- Let user u have vector p of length k and item i have vector q of length k.
- The predicted score is the dot product of p and q, often plus user and item bias terms and a global mean.
- The dimensions of these vectors are latent factors, automatically discovered axes such as how much an item leans toward action or romance.
Learning the vectors
Minimize the squared error between predicted and observed ratings on the entries we do have, plus an L2 regularization term to prevent overfitting. Two common solvers are stochastic gradient descent and alternating least squares, where you fix item vectors and solve for users, then swap.
Why it works
It generalizes across the sparse matrix: even items the user never touched get a score through shared latent dimensions. This was the key insight behind the Netflix Prize winners.
Key idea
Matrix factorization learns low dimensional user and item vectors whose dot product predicts ratings, generalizing across a sparse matrix.