How Does the Optimizer Implicitly Bias the Model Merging Loss Landscape?
Abstract
Model merging combines independent solutions with different capabilities into a single one while maintaining the same inference cost. Two popular approaches are _linear interpolation_, which simply averages multiple model weights, and _task arithmetic_, which combines task vectors obtained by the difference between finetuned and base models. While useful in practice, what properties make merging effective are poorly understood. This paper explores how the optimization dynamics affect the loss landscape geometry and its impact on merging success. We show that a single quantity -- the _effective noise scale_ -- unifies the impact of different optimizer components on model merging. Across architectures and datasets, merging success is a non-monotonic function of the effective noise scale, with a distinct optimum. Decomposing this quantity, we find that larger learning rates, stronger weight decay, smaller batch sizes, and data augmentation all independently modulate the effective noise scale and exhibit the same qualitative trend. Unlike prior work connecting optimizer noise to the flatness or generalization of _individual_ minima, we show that it also affects the _global_ loss landscape, predicting when independently trained solutions can be successfully merged. Our findings broaden the understanding of how optimization shapes the loss landscape geometry and its consequences for model merging, suggesting that training dynamics could be further manipulated to improve model merging.
Cite
Text
Zhang et al. "How Does the Optimizer Implicitly Bias the Model Merging Loss Landscape?." International Conference on Learning Representations, 2026.Markdown
[Zhang et al. "How Does the Optimizer Implicitly Bias the Model Merging Loss Landscape?." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-optimizer/)BibTeX
@inproceedings{zhang2026iclr-optimizer,
title = {{How Does the Optimizer Implicitly Bias the Model Merging Loss Landscape?}},
author = {Zhang, Chenxiang and Theus, Alexander and Teney, Damien and Orvieto, Antonio and Pang, Jun and Mauw, Sjouke},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhang2026iclr-optimizer/}
}