What bias evaluation targets
Bias evaluation asks whether a model treats people differently based on attributes like gender, race, age, or religion. The concern is unjustified differences in tone, accuracy, or assumptions tied to group identity.
Common methods
- Counterfactual tests, swapping only the group term and checking whether the answer changes unfairly.
- Association probes, measuring whether stereotypical completions are preferred.
- Disaggregated accuracy, reporting task performance per group to expose gaps.
- Sentiment and regard measures across demographic mentions.
A fair model gives equivalent quality and tone when only the identity term differs.
Subtleties
Bias is not one number. A model can be fair on tone yet biased on accuracy, or fair for one group and not another. Some differences are legitimate context, so a flat equality rule can mislabel correct behavior as bias. Definitions of fairness can also conflict, forcing explicit trade off choices.
Doing it responsibly
Use diverse, validated test sets, report per group results rather than an average, and document the fairness definition in use. Pair quantitative probes with human review, because automated metrics miss context that determines whether a difference is harmful.
Key idea
Bias evaluation uses counterfactual swaps and per group reporting to find unjustified differences tied to identity, but fairness has many conflicting definitions, so context and human judgment must accompany the numbers.