One model from many
Model merging combines several models, usually fine tuned from the same base, into a single model by blending their weights directly. No further training is required, and the result can inherit strengths from each parent.
Why it can work
Because the models share a common starting point, their weights live in a compatible region. A useful concept is the task vector: the difference between a fine tuned model and its base. Merging often adds or averages these task vectors onto the base.
- Averaging weights of models tuned on the same task can improve robustness.
- Adding task vectors from different tasks can give a multi task model.
- Subtracting a task vector can remove an unwanted behavior.
The flow
Cautions
Merging works best when models share the same base and architecture. Naive averaging can cause interference when task vectors conflict, so methods trim or resolve sign disagreements before merging. When it succeeds, merging yields a capable model with no extra inference cost and no training run.
Key idea
Model merging blends weights of models fine tuned from a shared base, often via task vectors, to combine skills cheaply, with care needed to resolve interference between conflicting tasks.