ToMoE: Converting Dense Large Language Models to Mixture-of-Experts Through Dynamic Structural Pruning

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities but face deployment challenges due to their high computational demands. Traditional pruning methods reduce these costs by permanently removing parameters, which inevitably leads to performance degradation. To mitigate this issue, we propose ToMoE, a method that transforms dense LLMs into Mixture-of-Experts (MoE) models by uncovering experts inherently present within dense models, without requiring any weight updates. ToMoE leverages dynamic structural pruning to unify expert construction and router training in a single stage, achieving consistently strong performance. Remarkably, even without fine-tuning \revise{the model weights}, ToMoE consistently outperforms state-of-the-art pruning and MoE techniques across Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5 models. The code for this paper is available at https://github.com/gaosh/ToMoE.

Cite

Text

Gao et al. "ToMoE: Converting Dense Large Language Models to Mixture-of-Experts Through Dynamic Structural Pruning." Transactions on Machine Learning Research, 2026.

Markdown

[Gao et al. "ToMoE: Converting Dense Large Language Models to Mixture-of-Experts Through Dynamic Structural Pruning." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/gao2026tmlr-tomoe/)

BibTeX

@article{gao2026tmlr-tomoe,
  title     = {{ToMoE: Converting Dense Large Language Models to Mixture-of-Experts Through Dynamic Structural Pruning}},
  author    = {Gao, Shangqian and Hua, Ting and Shirkavand, Reza and Lin, Chi-Heng and Tang, Zheng and Li, Zhengao and Yuan, Longge and Li, Fangyi and Zhang, Zeyu and Ganjdanesh, Alireza and Lou, Qian and Xu, Jie and Hsu, Yen-Chang},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/gao2026tmlr-tomoe/}
}