Matrix Factorization

From neighbors to latent factors

Neighborhood methods compare rows or columns directly. Matrix factorization instead learns a compact representation: every user gets a vector and every item gets a vector in the same low dimensional space. The predicted rating is their dot product.

The model

Let user u have vector p of length k and item i have vector q of length k.
The predicted score is the dot product of p and q, often plus user and item bias terms and a global mean.
The dimensions of these vectors are latent factors, automatically discovered axes such as how much an item leans toward action or romance.

Learning the vectors

Minimize the squared error between predicted and observed ratings on the entries we do have, plus an L2 regularization term to prevent overfitting. Two common solvers are stochastic gradient descent and alternating least squares, where you fix item vectors and solve for users, then swap.

Why it works

It generalizes across the sparse matrix: even items the user never touched get a score through shared latent dimensions. This was the key insight behind the Netflix Prize winners.

Key idea

Matrix factorization learns low dimensional user and item vectors whose dot product predicts ratings, generalizing across a sparse matrix.

Matrix Factorization

From neighbors to latent factors

The model

Learning the vectors

Why it works

Key idea

Check yourself