LoRA Fine Tuning

Adapting a frozen model by training small low rank update matrices instead of all weights.

The problem with full fine tuning

Fine tuning every weight of a large model needs huge memory for the weights, their gradients, and optimizer state. LoRA, low rank adaptation, avoids this by freezing the original weights and training only small add on matrices.

Low rank updates

LoRA assumes the change a task needs is low rank, meaning it lives in a small subspace. For a weight matrix it learns two thin matrices whose product forms the update:

Keep the original weights frozen.
Add a learned update equal to the product of two small matrices.
Only those small matrices receive gradients.

Because the two matrices are tiny compared to the full weight, the number of trainable parameters drops by orders of magnitude.

Practical benefits

Far less memory, so fine tuning fits on smaller GPUs.
Many task specific adapters can share one frozen base model.
Adapters can be merged into the weights at inference for no extra cost.

Key idea

LoRA freezes the base model and learns a low rank update from two small matrices, cutting trainable parameters dramatically while keeping quality.

The problem with full fine tuning

Low rank updates

Practical benefits

Key idea

Check yourself