The Full Fine Tuning

Starting from a pretrained model

A model trained on a huge general corpus already knows a great deal about language or images. Full fine tuning continues training that model on a smaller task specific dataset, updating every parameter so the model specializes.

Why it works

The pretrained weights give a strong starting point, so far less data is needed than training from scratch.
A smaller learning rate is used so the model adjusts rather than forgets.
Gradients flow through the whole network, letting all layers adapt.

The training flow

The cost

Full fine tuning produces a complete new copy of the model. For large models this is expensive in memory and storage, since the optimizer must hold gradients and states for every weight, and each task needs its own full checkpoint. These costs motivate the parameter efficient methods that follow.

Key idea

Full fine tuning updates all weights of a pretrained model on task data, giving strong adaptation at the cost of a full model copy and heavy memory use per task.

The Full Fine Tuning

Starting from a pretrained model

Why it works

The training flow

The cost

Key idea

Check yourself