Active Learning

The idea

Active learning reduces labeling cost by letting the model pick which examples to label next. Instead of labeling data at random, you label the examples the model would learn the most from, so you reach good accuracy with far fewer labels.

The loop

Active learning runs as a cycle between the model and a human labeler.

Train the model on the small set of labels you have so far
Use the model to score a large pool of unlabeled data
Select the most informative examples by a query strategy
Send those to a human to label, then add them and repeat

Query strategies

The query strategy decides what is most informative.

Uncertainty sampling picks examples the model is least confident about
Query by committee trains several models and picks examples they disagree on
Diversity methods avoid picking many near identical points

When it helps and its risks

Active learning shines when unlabeled data is plentiful but labeling is expensive, such as medical images. A risk is sampling bias, since the model only sees the points it chose, which can skew the training set. Mixing in some random samples helps keep the data representative.

Key idea

Active learning queries the most informative unlabeled examples for labeling, reaching high accuracy with fewer labels.

The idea

The loop

Query strategies

When it helps and its risks

Key idea

Check yourself