← Lessons

quiz vs the machine

Gold1430

Machine Learning

The LoRA Adapters Deep

Approximating weight updates with small low rank matrices.

5 min read · core · beat Gold to climb

A low rank update

LoRA, low rank adaptation, is a parameter efficient method built on a simple observation: the change a task requires to a weight matrix is often low rank. Instead of learning a full update, LoRA learns two small matrices whose product approximates it.

The decomposition

A weight update is represented as the product of a tall matrix and a wide matrix.

  • The frozen weight stays fixed.
  • A small matrix A and a small matrix B are trained.
  • Their product, scaled by a factor, is added to the frozen weight.

The rank sets how many columns A and B have, controlling capacity and parameter count.

The structure

Why it is popular

LoRA trains a tiny fraction of parameters yet matches much of full fine tuning quality. Because the update is just two small matrices, many task adapters can be stored cheaply and even merged into the base weights at inference for zero added latency, since the product can be folded into the original matrix.

Key idea

LoRA learns a low rank product that approximates the weight update, training few parameters while allowing adapters to be stored cheaply or merged into the base for free inference.

Check yourself

Answer to earn rating on the learn ladder.

1. What does LoRA learn instead of a full weight update?

2. Why can a LoRA adapter add zero inference latency?