Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Abstract
Scaling the capacity of language models has consistently proven to be a re- liable approach for improving performance and unlocking new capabilities. Capacity can be primarily defined by two dimensions: the number of model parameters and the compute per example. While scaling typically involves increasing both, the precise interplay between these factors and their com- bined contribution to overall capacity remains not fully understood. We explore this relationship in the context of sparse Mixture-of-Expert models (MoEs), which allow scaling the number of parameters without proportion- ally increasing the FLOPs per example. We investigate how varying the sparsity level, i.e., the fraction of inactive parameters, impacts model’s per- formance during pretraining and downstream evaluation. We find that un- der different constraints (e.g., parameter size and total training compute), there is an optimal level of sparsity that improves both training efficiency and model performance. These results provide a better understanding of the impact of sparsity in scaling laws for MoEs and complement existing works in this area, offering insights for designing more efficient architec- tures.
Cite
Text
Abnar et al. "Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models." ICLR 2025 Workshops: SLLM, 2025.Markdown
[Abnar et al. "Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/abnar2025iclrw-parameters/)BibTeX
@inproceedings{abnar2025iclrw-parameters,
title = {{Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models}},
author = {Abnar, Samira and Shah, Harshay and Busbridge, Dan and El-Nouby, Alaaeldin and Susskind, Joshua M. and Thilak, Vimal},
booktitle = {ICLR 2025 Workshops: SLLM},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/abnar2025iclrw-parameters/}
}