Pretext Tasks And Self Supervision

The label bottleneck

Supervised learning needs human labels, which are slow and costly, yet the world is full of unlabeled text, images, and audio. Self supervised learning turns that raw data into a training signal by hiding part of each example and asking the model to predict it.

Pretext tasks

The model trains on a pretext task whose answer comes from the data itself, no human needed:

Predict a masked word from its surrounding context
Predict the next token in a sequence
Decide whether two augmented views come from the same image

Solving these forces the model to learn deep structure: grammar, object parts, and semantic relationships. The result is a general purpose representation.

Why it changed the field

After self supervised pretraining on huge corpora, a model needs only a small labeled set to fine tune for a specific task. This pretrain then adapt recipe underlies modern language models and many vision systems.

Key idea

Self supervised learning manufactures labels from unlabeled data through pretext tasks, yielding strong representations that transfer with little labeled data.

Pretext Tasks And Self Supervision

The label bottleneck

Pretext tasks

Why it changed the field

Key idea

Check yourself