The Transfer Learning Fine Tuning

Borrowing learned features

Training from scratch needs huge data. Transfer learning starts from a network pretrained on a large dataset and adapts it to your task. Early layers already encode general features like edges and textures, so you reuse them.

Two strategies

Feature extraction freeze the pretrained backbone and train only a new head. Fast and safe when your data is small.
Fine tuning unfreeze some or all layers and train them at a low learning rate so pretrained knowledge is refined, not destroyed.

A staged approach

Getting it right

Use a small learning rate for pretrained layers to avoid catastrophic forgetting.
Discriminative rates let later layers learn faster than early ones, since early features are more general.
Watch batch norm statistics; freezing or updating them changes behavior on a small dataset.

Practical notes

The closer the source and target domains, the more layers you can safely fine tune.
With very little data, lean toward feature extraction to avoid overfitting.

Key idea

Transfer learning reuses a pretrained backbone, then either freezes it for feature extraction or fine tunes it at a low, discriminative rate. Small rates protect general features from catastrophic forgetting on the new task.