ReMod: Learning Structured Sparsity with ReLU Modulation

Abstract

Large language models demand substantial computational resources for training and inference. Leveraging contextual sparsity to convert dense modules into sparsely computed Mixture of Experts (MoE) offers a promising solution, but existing methods face challenges in effectively partitioning modules and handling abrupt, non-differentiable changes during conversion. We introduce ReMod (ReLU Modulation), which creates sparsity smoothly and differentiably while integrating clustering directly into training. Our method trains a small ReLU-gated modulator that scales hidden states to sparsify computation, then clusters modulator weights to create structured sparsity with optimized hardware utilization. When applied to MLPs and Attention projections in Bert-base, ReMod reduces inference FLOPs by up to 93% while maintaining comparable accuracy—significantly outperforming previous approaches.

Cite

Text

Zhang and Ren. "ReMod: Learning Structured Sparsity with ReLU Modulation." ICLR 2025 Workshops: MCDC, 2025.

Markdown

[Zhang and Ren. "ReMod: Learning Structured Sparsity with ReLU Modulation." ICLR 2025 Workshops: MCDC, 2025.](https://mlanthology.org/iclrw/2025/zhang2025iclrw-remod/)

BibTeX

@inproceedings{zhang2025iclrw-remod,
  title     = {{ReMod: Learning Structured Sparsity with ReLU Modulation}},
  author    = {Zhang, Wenbo and Ren, Xiang},
  booktitle = {ICLR 2025 Workshops: MCDC},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/zhang2025iclrw-remod/}
}