Mixture of Experts for Image Classification: What's the Sweet Spot?

Abstract

Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across domains. However, their application to image classification remains limited, often requiring billion-scale datasets to be competitive. In this work, we explore the integration of MoE layers into image classification architectures using open datasets. We conduct a systematic analysis across different MoE configurations and model scales. We find that moderate parameter activation per sample provides the best trade-off between performance and efficiency. However, as the number of activated parameters increases, the benefits of MoE diminish. Our analysis yields several practical insights for vision MoE design. First, MoE layers most effectively strengthen tiny and mid-sized models, while gains taper off for large-capacity networks and do not redefine state-of-the-art ImageNet performance. Second, a Last-2 placement heuristic offers the most robust cross-architecture choice, with Every-2 slightly better for Vision Transform (ViT), and both remaining effective as data and model scale increase. Third, larger datasets (e.g., ImageNet-21k) allow more experts, up to 16, for ConvNeXt to be utilized effectively without changing placement, as increased data reduces overfitting and promotes broader expert specialization. Finally, a simple linear router performs best, suggesting that additional routing complexity yields no consistent benefit.

Cite

Text

Videau et al. "Mixture of Experts for Image Classification: What's the Sweet Spot?." Transactions on Machine Learning Research, 2025.

Markdown

[Videau et al. "Mixture of Experts for Image Classification: What's the Sweet Spot?." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/videau2025tmlr-mixture/)

BibTeX

@article{videau2025tmlr-mixture,
  title     = {{Mixture of Experts for Image Classification: What's the Sweet Spot?}},
  author    = {Videau, Mathurin and Leite, Alessandro and Schoenauer, Marc and Teytaud, Olivier},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/videau2025tmlr-mixture/}
}