The Weak Supervision

How noisy labeling functions combine into training labels without hand annotation.

Labels without manual annotation

Hand labeling is slow, so weak supervision generates training labels programmatically. You write many imperfect rules and combine their noisy outputs into a single probabilistic label per example.

Labeling functions

A labeling function is a small piece of logic that votes a label based on a heuristic, a keyword, a regex, or an external knowledge base.
Each one is noisy and incomplete. It may be wrong sometimes and abstain on examples it does not cover.
You write dozens of them, each capturing a different signal.

Combining the votes

A label model estimates how accurate and how correlated the functions are, then merges their votes into a probabilistic label.
Crucially it does this without ground truth, by analyzing where functions agree and disagree.
The resulting soft labels train a normal downstream model.

Why it works

Many weak signals that each beat random, when combined and de noised, approach the quality of hand labels at a fraction of the cost.
It also makes labeling reusable, since updating a labeling function relabels the whole dataset instantly.

Key idea

Weak supervision writes many noisy labeling functions and de noises their votes with a label model, producing training labels cheaply without manual annotation.

The Weak Supervision

Labels without manual annotation

Labeling functions

Combining the votes

Why it works

Key idea

Check yourself