Borrowing learned features
Training from scratch needs huge data. Transfer learning starts from a network pretrained on a large dataset and adapts it to your task. Early layers already encode general features like edges and textures, so you reuse them.
Two strategies
- Feature extraction freeze the pretrained backbone and train only a new head. Fast and safe when your data is small.
- Fine tuning unfreeze some or all layers and train them at a low learning rate so pretrained knowledge is refined, not destroyed.
A staged approach
Getting it right
- Use a small learning rate for pretrained layers to avoid catastrophic forgetting.
- Discriminative rates let later layers learn faster than early ones, since early features are more general.
- Watch batch norm statistics; freezing or updating them changes behavior on a small dataset.
Practical notes
- The closer the source and target domains, the more layers you can safely fine tune.
- With very little data, lean toward feature extraction to avoid overfitting.
Key idea
Transfer learning reuses a pretrained backbone, then either freezes it for feature extraction or fine tunes it at a low, discriminative rate. Small rates protect general features from catastrophic forgetting on the new task.