Bias Mitigation Preprocessing

Fix the data first

Preprocessing mitigation changes the training data so that any model trained on it tends to be fairer. The model and training algorithm stay untouched, which makes these methods flexible and easy to bolt onto an existing pipeline.

Common techniques

Reweighting: assign sample weights so each group and label combination is balanced, countering historical imbalance.
Resampling: oversample underrepresented group outcomes or undersample overrepresented ones.
Relabeling: flip a small number of labels near the boundary to remove bias.
Representation learning: transform features into a space where the protected attribute is hard to recover.

Strengths and limits

Preprocessing is model agnostic and keeps the downstream training simple. But it acts blindly to what the model will do, so it cannot guarantee a specific fairness metric is met, and aggressive edits can distort the data.

Key idea

Preprocessing mitigation reweights, resamples, relabels, or transforms the data so any downstream model is fairer, offering a model agnostic fix that cannot fully guarantee a chosen fairness metric.

Bias Mitigation Preprocessing

Fix the data first

Common techniques

Strengths and limits

Key idea

Check yourself