Losing what was learned
When a model is fine tuned on a new task, its weights shift to fit the new data. If that shift is too large, the model can lose abilities it had before. This is catastrophic forgetting: gaining a new skill at the cost of old ones.
Why it happens
- Weights are shared across tasks, so adjusting them for new data overwrites old patterns.
- A narrow fine tuning set pulls the model away from its broad prior knowledge.
- High learning rates and long training amplify the drift.
The drift
Ways to reduce it
- Lower learning rates and fewer epochs limit how far weights move.
- Replaying some original or diverse data keeps old skills active.
- Parameter efficient methods freeze the backbone, so the core knowledge is preserved by construction.
- Regularization can penalize moving weights that mattered for prior tasks.
Key idea
Catastrophic forgetting is the loss of prior abilities when fine tuning overwrites shared weights, and it is mitigated by gentle updates, data replay, frozen backbones, or regularization.