← Lessons

quiz vs the machine

Platinum1820

Machine Learning

The Transfer Learning Fine Tuning

Adapting a pretrained network to a new task with the right freezing strategy.

5 min read · advanced · beat Platinum to climb

Borrowing learned features

Training from scratch needs huge data. Transfer learning starts from a network pretrained on a large dataset and adapts it to your task. Early layers already encode general features like edges and textures, so you reuse them.

Two strategies

  • Feature extraction freeze the pretrained backbone and train only a new head. Fast and safe when your data is small.
  • Fine tuning unfreeze some or all layers and train them at a low learning rate so pretrained knowledge is refined, not destroyed.

A staged approach

Getting it right

  • Use a small learning rate for pretrained layers to avoid catastrophic forgetting.
  • Discriminative rates let later layers learn faster than early ones, since early features are more general.
  • Watch batch norm statistics; freezing or updating them changes behavior on a small dataset.

Practical notes

  • The closer the source and target domains, the more layers you can safely fine tune.
  • With very little data, lean toward feature extraction to avoid overfitting.

Key idea

Transfer learning reuses a pretrained backbone, then either freezes it for feature extraction or fine tunes it at a low, discriminative rate. Small rates protect general features from catastrophic forgetting on the new task.

Check yourself

Answer to earn rating on the learn ladder.

1. Why use a small learning rate when fine tuning pretrained layers?

2. When should you prefer feature extraction over full fine tuning?