Labels without manual annotation
Hand labeling is slow, so weak supervision generates training labels programmatically. You write many imperfect rules and combine their noisy outputs into a single probabilistic label per example.
Labeling functions
- A labeling function is a small piece of logic that votes a label based on a heuristic, a keyword, a regex, or an external knowledge base.
- Each one is noisy and incomplete. It may be wrong sometimes and abstain on examples it does not cover.
- You write dozens of them, each capturing a different signal.
Combining the votes
- A label model estimates how accurate and how correlated the functions are, then merges their votes into a probabilistic label.
- Crucially it does this without ground truth, by analyzing where functions agree and disagree.
- The resulting soft labels train a normal downstream model.
Why it works
- Many weak signals that each beat random, when combined and de noised, approach the quality of hand labels at a fraction of the cost.
- It also makes labeling reusable, since updating a labeling function relabels the whole dataset instantly.
Key idea
Weak supervision writes many noisy labeling functions and de noises their votes with a label model, producing training labels cheaply without manual annotation.