The Catastrophic Forgetting

Losing what was learned

When a model is fine tuned on a new task, its weights shift to fit the new data. If that shift is too large, the model can lose abilities it had before. This is catastrophic forgetting: gaining a new skill at the cost of old ones.

Why it happens

Weights are shared across tasks, so adjusting them for new data overwrites old patterns.
A narrow fine tuning set pulls the model away from its broad prior knowledge.
High learning rates and long training amplify the drift.

The drift

Ways to reduce it

Lower learning rates and fewer epochs limit how far weights move.
Replaying some original or diverse data keeps old skills active.
Parameter efficient methods freeze the backbone, so the core knowledge is preserved by construction.
Regularization can penalize moving weights that mattered for prior tasks.

Key idea

Catastrophic forgetting is the loss of prior abilities when fine tuning overwrites shared weights, and it is mitigated by gentle updates, data replay, frozen backbones, or regularization.

The Catastrophic Forgetting

Losing what was learned

Why it happens

The drift

Ways to reduce it

Key idea

Check yourself