← Lessons

quiz vs the machine

Gold1390

Machine Learning

The Adapter Layers

Inserting small bottleneck modules between frozen transformer layers.

5 min read · core · beat Gold to climb

Small modules inside the network

Adapter layers are a parameter efficient method that inserts tiny trainable modules into a frozen pretrained network. Each adapter is a small bottleneck: it projects the hidden state down to a low dimension, applies a nonlinearity, then projects back up.

The bottleneck design

  • A down projection shrinks the hidden size to a small bottleneck.
  • A nonlinearity adds expressive power.
  • An up projection restores the original size.
  • A residual connection lets the adapter start near identity.

Only these projections are trained; the surrounding transformer stays frozen.

Where they go

Strengths and costs

Adapters reach strong accuracy with few parameters and keep each task as a small module. Because they start near identity via the residual, training is stable. The main cost is a slight inference overhead: unlike merged LoRA, the extra layers run sequentially and add a little latency, though it is usually small relative to the model.

Key idea

Adapter layers insert small bottleneck modules with a residual into a frozen network, training only the projections for efficient task adaptation at a small inference cost.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the shape of an adapter module?

2. Why do adapters use a residual connection?