Ranking, not rating
Bayesian personalized ranking, or BPR, attacks recommendation as a ranking problem from implicit feedback. Instead of predicting a number for each item, it learns that a user prefers an item they touched over one they did not.
The pairwise assumption
For each user, sample a triple of user, a positive item the user interacted with, and a negative item they did not. The model should score the positive higher than the negative.
- Define the preference difference as the positive score minus the negative score.
- Push that difference to be large and positive.
The loss
BPR maximizes the log of the sigmoid of the score difference, summed over sampled triples, plus a regularizer. This is the maximum posterior estimate under a Bernoulli model of pairwise order.
- A correctly ordered pair contributes little gradient.
- A wrongly ordered pair pushes the two vectors apart hard.
Why it works well
- It never forces unobserved items to be hard negatives, only relatively lower.
- It pairs naturally with matrix factorization scores trained by stochastic gradient descent.
Key idea
BPR trains recommenders on sampled positive over negative pairs, optimizing relative order with a sigmoid loss rather than absolute scores, which fits implicit feedback naturally.