Autonomy-of-Experts Models

Abstract

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router’s decision-making and the experts’ execution is a critical yet overlooked issue, leading to suboptimal expert selection and learning. To address this, we propose Autonomy-of-Expert (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

Cite

Text

Lv et al. "Autonomy-of-Experts Models." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Lv et al. "Autonomy-of-Experts Models." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/lv2025icml-autonomyofexperts/)

BibTeX

@inproceedings{lv2025icml-autonomyofexperts,
  title     = {{Autonomy-of-Experts Models}},
  author    = {Lv, Ang and Xie, Ruobing and Qian, Yining and Wu, Songhao and Sun, Xingwu and Kang, Zhanhui and Wang, Di and Yan, Rui},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {41667-41681},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/lv2025icml-autonomyofexperts/}
}