Small modules inside the network
Adapter layers are a parameter efficient method that inserts tiny trainable modules into a frozen pretrained network. Each adapter is a small bottleneck: it projects the hidden state down to a low dimension, applies a nonlinearity, then projects back up.
The bottleneck design
- A down projection shrinks the hidden size to a small bottleneck.
- A nonlinearity adds expressive power.
- An up projection restores the original size.
- A residual connection lets the adapter start near identity.
Only these projections are trained; the surrounding transformer stays frozen.
Where they go
Strengths and costs
Adapters reach strong accuracy with few parameters and keep each task as a small module. Because they start near identity via the residual, training is stable. The main cost is a slight inference overhead: unlike merged LoRA, the extra layers run sequentially and add a little latency, though it is usually small relative to the model.
Key idea
Adapter layers insert small bottleneck modules with a residual into a frozen network, training only the projections for efficient task adaptation at a small inference cost.