Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Abstract
Symmetry in the parameter space of deep neural networks (DNNs) has proven beneficial for various deep learning applications. A well-known example is the permutation symmetry in Multi-Layer Perceptrons (MLPs), where permuting the rows of weight matrices in one layer and applying the inverse permutation to adjacent layers yields a functionally equivalent model. While permutation symmetry fully characterizes the equivalence set for MLPs, its discrete nature limits its utility for transformers. In this paper, we introduce rotation symmetry, a novel form of parameter space symmetry for transformers that generalizes permutation symmetry by rotating parameter matrices in self-attention layers. Unlike permutation symmetry, rotation symmetry operates in a continuous domain, thereby significantly expanding the equivalence set for transformers. Based on this property, we propose a theoretically optimal parameter matching algorithm as a plug-and-play module to enhance model fusion. We evaluate our approach using pre-trained transformers across diverse natural language and vision tasks. Experimental results demonstrate that our rotation symmetry-based matching algorithm substantially improves model fusion, highlighting the potential of parameter space symmetry to facilitate model fusion. Our code is available on https://github.com/zhengzaiyi/RotationSymmetry
Cite
Text
Zhang et al. "Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Zhang et al. "Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-beyond-a/)BibTeX
@inproceedings{zhang2025icml-beyond-a,
title = {{Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion}},
author = {Zhang, Binchi and Zheng, Zaiyi and Chen, Zhengzhang and Li, Jundong},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {77090-77106},
volume = {267},
url = {https://mlanthology.org/icml/2025/zhang2025icml-beyond-a/}
}