The LoRA Adapters Deep

A low rank update

LoRA, low rank adaptation, is a parameter efficient method built on a simple observation: the change a task requires to a weight matrix is often low rank. Instead of learning a full update, LoRA learns two small matrices whose product approximates it.

The decomposition

A weight update is represented as the product of a tall matrix and a wide matrix.

The frozen weight stays fixed.
A small matrix A and a small matrix B are trained.
Their product, scaled by a factor, is added to the frozen weight.

The rank sets how many columns A and B have, controlling capacity and parameter count.

The structure

Why it is popular

LoRA trains a tiny fraction of parameters yet matches much of full fine tuning quality. Because the update is just two small matrices, many task adapters can be stored cheaply and even merged into the base weights at inference for zero added latency, since the product can be folded into the original matrix.

Key idea

LoRA learns a low rank product that approximates the weight update, training few parameters while allowing adapters to be stored cheaply or merged into the base for free inference.

The LoRA Adapters Deep

A low rank update

The decomposition

The structure

Why it is popular

Key idea

Check yourself