← Lessons

quiz vs the machine

Silver1110

Machine Learning

Label Bias

When the ground truth itself is wrong or unfairly assigned.

4 min read · intro · beat Silver to climb

The target can lie

Supervised learning trusts its labels as truth. But labels are produced by humans or proxy processes, and they can carry bias of their own. Label bias means the supposed ground truth systematically misrepresents reality for some groups.

Where it comes from

  • Subjective judgments: annotators disagree and bring their own assumptions.
  • Proxy targets: you wanted to predict need but only measured past spending, which reflects access not need.
  • Feedback loops: past decisions shaped who got labeled how, baking old unfairness into new labels.

A classic trap

Suppose arrests are used as a proxy for crime. If policing focused on certain neighborhoods, arrest labels overcount those areas. A model then learns to over predict there, not because of more crime but because of more labeling.

What helps

  • Question whether the label is the true outcome or a stand in.
  • Measure label disagreement across annotators and groups.
  • Prefer outcome labels that are observed directly when possible.

Key idea

Label bias means the training target itself misrepresents reality, often through subjective annotation, proxy outcomes, or feedback loops, so always ask whether the label is truth or a biased stand in.

Check yourself

Answer to earn rating on the learn ladder.

1. What is a proxy label?

2. Why are feedback loops dangerous for labels?