The Bayesian Personalized Ranking

Ranking, not rating

Bayesian personalized ranking, or BPR, attacks recommendation as a ranking problem from implicit feedback. Instead of predicting a number for each item, it learns that a user prefers an item they touched over one they did not.

The pairwise assumption

For each user, sample a triple of user, a positive item the user interacted with, and a negative item they did not. The model should score the positive higher than the negative.

Define the preference difference as the positive score minus the negative score.
Push that difference to be large and positive.

The loss

BPR maximizes the log of the sigmoid of the score difference, summed over sampled triples, plus a regularizer. This is the maximum posterior estimate under a Bernoulli model of pairwise order.

A correctly ordered pair contributes little gradient.
A wrongly ordered pair pushes the two vectors apart hard.

Why it works well

It never forces unobserved items to be hard negatives, only relatively lower.
It pairs naturally with matrix factorization scores trained by stochastic gradient descent.

Key idea