Tuning the input, not the weights
Some parameter efficient methods leave the entire model frozen and instead learn a small set of continuous vectors that condition it. These vectors are not real words but trainable embeddings, often called soft prompts or virtual tokens.
Two related ideas
- Prompt tuning prepends a few learned embedding vectors to the input sequence. Only those vectors are trained.
- Prefix tuning goes deeper, adding learned vectors to the keys and values at every attention layer, giving more steering power.
Both keep the backbone untouched and train only the added vectors.
The structure
Trade offs
These methods store an extremely small number of parameters per task, just the soft prompt. Prompt tuning becomes more effective as models grow larger. Prefix tuning, by injecting at every layer, usually adapts better on smaller models at the cost of slightly more parameters. Neither changes the original weights, so one frozen model serves many tasks.
Key idea
Prefix and prompt tuning freeze the model and learn small continuous vectors that steer it, with prompt tuning at the input and prefix tuning injecting into every attention layer.