LoRA Adapters

The motivation

Full fine tuning of a giant model updates billions of weights, which is expensive and produces a huge copy per task. LoRA, short for low rank adaptation, makes this cheap by training only small add on matrices.

The low rank trick

Instead of changing the original weight matrix, LoRA learns a small update expressed as the product of two thin matrices.

The big pretrained weights stay frozen
A pair of low rank matrices learns the change
Their product is added to the frozen weights at inference
Only these small matrices are trained and saved

Because the update has low rank, it needs a tiny fraction of the parameters, often well under one percent.

Why it is popular

LoRA adapters are small files you can swap per task while sharing one base model. This is a form of parameter efficient fine tuning. A common variant called QLoRA combines LoRA with quantization so even very large models can be tuned on a single GPU.

Key idea

LoRA freezes the base model and trains tiny low rank matrices as the weight update, enabling cheap swappable fine tuning of large models.

The motivation

The low rank trick

Why it is popular

Key idea

Check yourself