Fix the data first
Preprocessing mitigation changes the training data so that any model trained on it tends to be fairer. The model and training algorithm stay untouched, which makes these methods flexible and easy to bolt onto an existing pipeline.
Common techniques
- Reweighting: assign sample weights so each group and label combination is balanced, countering historical imbalance.
- Resampling: oversample underrepresented group outcomes or undersample overrepresented ones.
- Relabeling: flip a small number of labels near the boundary to remove bias.
- Representation learning: transform features into a space where the protected attribute is hard to recover.
Strengths and limits
Preprocessing is model agnostic and keeps the downstream training simple. But it acts blindly to what the model will do, so it cannot guarantee a specific fairness metric is met, and aggressive edits can distort the data.
Key idea
Preprocessing mitigation reweights, resamples, relabels, or transforms the data so any downstream model is fairer, offering a model agnostic fix that cannot fully guarantee a chosen fairness metric.