The setup
A recommendation problem is a giant, mostly empty matrix of users by items. Matrix factorization approximates it as the product of two smaller matrices, one for users and one for items.
Latent factors
Each user becomes a short vector of latent factors, and so does each item. A predicted rating is the dot product of a user vector and an item vector.
- Factors are not labeled, but they often capture themes like genre or price level.
- A high score means the user vector aligns with the item vector.
- The model learns factors that reconstruct the observed ratings.
How it trains
- Minimize squared error on known entries only, ignoring the empty cells.
- Add regularization so factors do not overfit sparse data.
- Solve with stochastic gradient descent or alternating least squares.
Why it works well
- It generalizes to unseen user item pairs by combining their factors.
- It scales far better than storing a full similarity matrix.
- It often beats neighbor methods on sparse, large catalogs.
Key idea
Matrix factorization represents users and items as latent factor vectors whose dot product predicts ratings, generalizing across a sparse matrix far better than neighbor methods.