← Lessons

quiz vs the machine

Gold1450

Machine Learning

The Weak Supervision

How noisy labeling functions combine into training labels without hand annotation.

5 min read · core · beat Gold to climb

Labels without manual annotation

Hand labeling is slow, so weak supervision generates training labels programmatically. You write many imperfect rules and combine their noisy outputs into a single probabilistic label per example.

Labeling functions

  • A labeling function is a small piece of logic that votes a label based on a heuristic, a keyword, a regex, or an external knowledge base.
  • Each one is noisy and incomplete. It may be wrong sometimes and abstain on examples it does not cover.
  • You write dozens of them, each capturing a different signal.

Combining the votes

  • A label model estimates how accurate and how correlated the functions are, then merges their votes into a probabilistic label.
  • Crucially it does this without ground truth, by analyzing where functions agree and disagree.
  • The resulting soft labels train a normal downstream model.

Why it works

  • Many weak signals that each beat random, when combined and de noised, approach the quality of hand labels at a fraction of the cost.
  • It also makes labeling reusable, since updating a labeling function relabels the whole dataset instantly.

Key idea

Weak supervision writes many noisy labeling functions and de noises their votes with a label model, producing training labels cheaply without manual annotation.

Check yourself

Answer to earn rating on the learn ladder.

1. What is a labeling function in weak supervision?

2. How does the label model combine votes without ground truth?