The goal of the objective
Contrastive learning trains an encoder so that semantically related inputs map to nearby vectors and unrelated inputs map to distant ones. It learns a useful embedding space without needing explicit class labels for every example.
Positives and negatives
Each anchor example is paired with:
- A positive, something that should be close, such as an augmented view or a paraphrase.
- One or more negatives, things that should be far.
The loss rewards high similarity to the positive and low similarity to the negatives.
The InfoNCE loss
A common choice is the InfoNCE loss, which treats the problem as picking the true positive out of a batch. For each anchor it forms a softmax over similarities, where the positive should get the most weight. Because every other item in the batch acts as a negative, large batches give more negatives and stronger signal.
Why it works
By repeatedly contrasting, the encoder discovers the features that actually distinguish meaningfully different inputs, producing a space where downstream similarity search and classification are easy.
Key idea
Contrastive learning shapes an embedding space by pulling positives together and pushing negatives apart, and the InfoNCE loss with many in batch negatives is the workhorse that makes this scale.