A low rank update
LoRA, low rank adaptation, is a parameter efficient method built on a simple observation: the change a task requires to a weight matrix is often low rank. Instead of learning a full update, LoRA learns two small matrices whose product approximates it.
The decomposition
A weight update is represented as the product of a tall matrix and a wide matrix.
- The frozen weight stays fixed.
- A small matrix A and a small matrix B are trained.
- Their product, scaled by a factor, is added to the frozen weight.
The rank sets how many columns A and B have, controlling capacity and parameter count.
The structure
Why it is popular
LoRA trains a tiny fraction of parameters yet matches much of full fine tuning quality. Because the update is just two small matrices, many task adapters can be stored cheaply and even merged into the base weights at inference for zero added latency, since the product can be folded into the original matrix.
Key idea
LoRA learns a low rank product that approximates the weight update, training few parameters while allowing adapters to be stored cheaply or merged into the base for free inference.