MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition

Abstract

Mixture of Experts (MoE) architecture improves Large Language Models (LLMs) with better scaling, but its higher parameter counts and memory demands create challenges for deployment. In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. By harnessing the power of Singular Value Decomposition (SVD), MoE-SVD addresses the critical issues of decomposition collapse and matrix redundancy in MoE architectures. Specifically, we first decompose experts into compact low-rank matrices, resulting in accelerated inference and memory optimization. In particular, we propose selective decomposition strategy by measuring sensitivity metrics based on weight singular values and activation statistics to automatically identify decomposable expert layers. Then, we share a single V-matrix across all experts and employ a top-k selection for U-matrices. This low-rank matrix sharing and trimming scheme allows for significant parameter reduction while preserving diversity among experts. Comprehensive experiments on Mixtral, Phi-3.5, DeepSeek, and Qwen2 MoE LLMs show MoE-SVD outperforms other compression methods, achieving a 60% compression ratio and 1.5$\times$ faster inference with minimal performance loss.

Cite

Text

Li et al. "MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Li et al. "MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/li2025icml-moesvd/)

BibTeX

@inproceedings{li2025icml-moesvd,
  title     = {{MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition}},
  author    = {Li, Wei and Li, Lujun and Gu, Hao and Huang, You-Liang and Lee, Mark G. and Sun, Shengjie and Xue, Wei and Guo, Yike},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {35209-35230},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/li2025icml-moesvd/}
}