Why measure fairness
A model can be accurate overall yet treat groups unequally, for example approving loans at very different rates for two populations. Fairness metrics make these gaps measurable so they can be examined and addressed.
Common metrics
Different definitions of fairness measure different things across a sensitive group.
- Demographic parity asks whether the positive prediction rate is the same across groups
- Equal opportunity asks whether the true positive rate is the same across groups
- Equalized odds asks whether both true positive and false positive rates match across groups
The impossibility result
These definitions can conflict. A well known result shows that when base rates differ between groups, you generally cannot satisfy demographic parity, equal opportunity, and calibration all at once. So fairness is not one number, it is a choice of which property matters for the context.
In practice
You first identify the sensitive attribute, then measure the chosen metric per group, and investigate any large gap. Bias often comes from skewed training data or labels, so fixes may target data collection, not just the model. The right metric depends on the harm you are trying to prevent.
Key idea
Fairness is measured by group based metrics that can conflict, so you must choose the definition that fits the harm at stake.