The Adapter Layers

Small modules inside the network

Adapter layers are a parameter efficient method that inserts tiny trainable modules into a frozen pretrained network. Each adapter is a small bottleneck: it projects the hidden state down to a low dimension, applies a nonlinearity, then projects back up.

The bottleneck design

A down projection shrinks the hidden size to a small bottleneck.
A nonlinearity adds expressive power.
An up projection restores the original size.
A residual connection lets the adapter start near identity.

Only these projections are trained; the surrounding transformer stays frozen.

Where they go

Strengths and costs

Adapters reach strong accuracy with few parameters and keep each task as a small module. Because they start near identity via the residual, training is stable. The main cost is a slight inference overhead: unlike merged LoRA, the extra layers run sequentially and add a little latency, though it is usually small relative to the model.

Key idea

Adapter layers insert small bottleneck modules with a residual into a frozen network, training only the projections for efficient task adaptation at a small inference cost.

The Adapter Layers

Small modules inside the network

The bottleneck design

Where they go

Strengths and costs

Key idea

Check yourself