MoDeGPT: Modular Decomposition for Large Language Model Compression

Abstract

Large Language Models (LLMs) have significantly advanced AI with their exceptional performance across a wide range of tasks. However, their extensive computational requirements restrict their use on devices with limited resources. While recent compression methods based on low-rank matrices show potential solutions, they often suffer from significant loss of accuracy or introduce substantial overhead in parameters and inference time. In this paper, we introduce Modular De- composition (MoDeGPT), a new, efficient, and structured compression framework that overcomes these limitations. MoDeGPT jointly decomposes pairs of consecu- tive subcomponents within Transformer blocks, reduces hidden dimensions through output reconstruction on a larger structural scale than conventional low-rank meth- ods, and repurposes three classical matrix decomposition algorithms—Nyström approximation, CR decomposition, and SVD—to ensure bounded errors in our novel decomposition approach. Our experiments show that MoDeGPT, without relying on backward propagation, consistently matches or surpasses the performance of prior techniques that depend on gradient information, while achieving a 98% reduction in compute costs when compressing a 13B-parameter model. On LLaMA-2/3 and OPT models, MoDeGPT retains 90-95% of zero-shot performance with compression rates of 25-30%. The compression process can be completed on a single GPU in a few hours, boosting inference throughput by up to 46%.

Cite

Text

Lin et al. "MoDeGPT: Modular Decomposition for Large Language Model Compression." International Conference on Learning Representations, 2025.

Markdown

[Lin et al. "MoDeGPT: Modular Decomposition for Large Language Model Compression." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/lin2025iclr-modegpt/)

BibTeX

@inproceedings{lin2025iclr-modegpt,
  title     = {{MoDeGPT: Modular Decomposition for Large Language Model Compression}},
  author    = {Lin, Chi-Heng and Gao, Shangqian and Smith, James Seale and Patel, Abhishek and Tuli, Shikhar and Shen, Yilin and Jin, Hongxia and Hsu, Yen-Chang},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/lin2025iclr-modegpt/}
}