← Lessons

quiz vs the machine

Gold1440

Machine Learning

The Negative Sampling

How sampling a few negatives makes training over huge label spaces tractable.

5 min read · core · beat Gold to climb

The huge output space problem

In tasks like recommendation or word embeddings, the set of possible items is enormous. Computing a probability over millions of items for every example is too expensive. Negative sampling trains on the true positive plus a small handful of randomly drawn negatives instead.

How it works

  • For each observed positive pair, such as a user and an item they clicked, draw a few items they did not interact with as negatives.
  • Train the model to score the positive high and the sampled negatives low.
  • This replaces a costly full softmax with a cheap comparison against a few items.

Why it works

  • The gradient from a few well chosen negatives approximates the gradient from the full set well enough to learn good representations.
  • It turns an intractable computation into a small fixed cost per example.

Choosing negatives

  • Uniform negatives are simple but may be too easy, since most random items are obviously irrelevant.
  • Sampling proportional to item popularity gives more informative negatives.
  • The hardest cases need deliberate selection, a topic of its own.

Key idea

Negative sampling trains on a positive plus a few sampled negatives, replacing an intractable full softmax with a cheap approximation that still learns strong representations.

Check yourself

Answer to earn rating on the learn ladder.

1. What problem does negative sampling solve?

2. Why might uniform negatives be weak?