The Pitfalls of Text Degeneration When Aligning LLMs Through Model Merge

Abstract

Model merge offers a cost-efficient method for integrating multiple specialized large language models (LLMs) into one comprehensive model. While it shows promise for encoder-decoder models in standard Natural Language Processing (NLP) tasks, \textbf{we find that merging decoder-based LLMs may lead to localized text degeneration, even when overall performance appears to improve.} We specifically assess the applications of model merge in steering LLMs to align better with diverse human preferences through interpolation and extrapolation merge. Our extensive experiments, covering model sizes ranging from $\mathtt{7b}$ to $\mathtt{70b}$ parameters, and including sixteen models with varying post-training, employ three popular merging methods: $\mathtt{Task~Arithmetic}$, $\mathtt{TIES}$-$\mathtt{Merging}$, and $\mathtt{Dare}$-$\mathtt{TIES}$. Our results uncover inherent limitations in current model merge applications for alignment, which can lead to text degeneration. We hope our findings will offer valuable insights for employing model merging in alignment scenarios and can help practitioners avoid potential pitfalls.

Cite

Text

Qing et al. "The Pitfalls of Text Degeneration When Aligning LLMs Through Model Merge." Transactions on Machine Learning Research, 2026.

Markdown

[Qing et al. "The Pitfalls of Text Degeneration When Aligning LLMs Through Model Merge." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/qing2026tmlr-pitfalls/)

BibTeX

@article{qing2026tmlr-pitfalls,
  title     = {{The Pitfalls of Text Degeneration When Aligning LLMs Through Model Merge}},
  author    = {Qing, Peijun and Hsiung, Lei and Zhang, Hefan and Lu, Haiquan and Diao, Xingjian and Ma, Chiyu and Hassanpour, Saeed and Vosoughi, Soroush},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/qing2026tmlr-pitfalls/}
}