The Active Learning Loop

How a model picks the most informative examples to label next, cutting annotation cost.

The labeling budget problem

Labels cost money and time, so labeling everything is wasteful. Active learning lets the model choose which unlabeled examples would teach it the most, so each label buys more accuracy.

The loop

Train a model on the current labeled set.
Run it over the unlabeled pool and score how informative each example would be.
Send the top scoring examples to human labelers.
Add the new labels and retrain.

Choosing what to label

Uncertainty sampling picks examples where the model is least confident, for example a predicted probability near the decision boundary.
Diversity sampling avoids labeling many near duplicates by spreading picks across the data.
Good systems blend both, since the single most uncertain points are often clustered.

Why it helps

The model learns most from cases it currently gets wrong, so targeting those reaches a given accuracy with far fewer labels than random sampling.

A caution

Active learning can over focus on a narrow region and ignore easy but important areas, so periodic random sampling keeps the labeled set representative.

Key idea

Active learning closes a loop where the model selects the most informative unlabeled examples to label, reaching target accuracy with fewer labels than random sampling.