← Lessons

quiz vs the machine

Platinum1720

Machine Learning

Active Learning

Let the model choose which unlabeled examples are most worth labeling.

5 min read · advanced · beat Platinum to climb

The idea

Active learning reduces labeling cost by letting the model pick which examples to label next. Instead of labeling data at random, you label the examples the model would learn the most from, so you reach good accuracy with far fewer labels.

The loop

Active learning runs as a cycle between the model and a human labeler.

  • Train the model on the small set of labels you have so far
  • Use the model to score a large pool of unlabeled data
  • Select the most informative examples by a query strategy
  • Send those to a human to label, then add them and repeat

Query strategies

The query strategy decides what is most informative.

  • Uncertainty sampling picks examples the model is least confident about
  • Query by committee trains several models and picks examples they disagree on
  • Diversity methods avoid picking many near identical points

When it helps and its risks

Active learning shines when unlabeled data is plentiful but labeling is expensive, such as medical images. A risk is sampling bias, since the model only sees the points it chose, which can skew the training set. Mixing in some random samples helps keep the data representative.

Key idea

Active learning queries the most informative unlabeled examples for labeling, reaching high accuracy with fewer labels.

Check yourself

Answer to earn rating on the learn ladder.

1. What does active learning optimize for?

2. What does uncertainty sampling select?

3. What risk does active learning introduce?