Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Abstract

Scaling the capacity of language models has consistently proven to be a re- liable approach for improving performance and unlocking new capabilities. Capacity can be primarily defined by two dimensions: the number of model parameters and the compute per example. While scaling typically involves increasing both, the precise interplay between these factors and their com- bined contribution to overall capacity remains not fully understood. We explore this relationship in the context of sparse Mixture-of-Expert models (MoEs), which allow scaling the number of parameters without proportion- ally increasing the FLOPs per example. We investigate how varying the sparsity level, i.e., the fraction of inactive parameters, impacts model’s per- formance during pretraining and downstream evaluation. We find that un- der different constraints (e.g., parameter size and total training compute), there is an optimal level of sparsity that improves both training efficiency and model performance. These results provide a better understanding of the impact of sparsity in scaling laws for MoEs and complement existing works in this area, offering insights for designing more efficient architec- tures.

Cite

Text

Abnar et al. "Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Abnar et al. "Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/abnar2025iclrw-parameters/)

BibTeX

@inproceedings{abnar2025iclrw-parameters,
  title     = {{Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models}},
  author    = {Abnar, Samira and Shah, Harshay and Busbridge, Dan and El-Nouby, Alaaeldin and Susskind, Joshua M. and Thilak, Vimal},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/abnar2025iclrw-parameters/}
}