LocMoE: A Low-Overhead MoE for Large Language Model Training
Abstract
Neural Architecture Search (NAS) has emerged as a powerful approach for automating neural network design. However, existing NAS methods face critical limitations in real-world deployments: architectures lack adaptability across scenarios, each deployment context requires costly separate searches, and performance consistency across diverse platforms remains challenging. We propose DANCE (Dynamic Architectures with Neural Continuous Evolution), which reformulates architecture search as a continuous evolution problem through learning distributions over architectural components. DANCE introduces three key innovations: a continuous architecture distribution enabling smooth adaptation, a unified architecture space with learned selection gates for efficient sampling, and a multi-stage training strategy for effective deployment optimization. Extensive experiments across five datasets demonstrate DANCE's effectiveness. Our method consistently outperforms state-of-the-art NAS approaches in terms of accuracy while significantly reducing search costs. Under varying computational constraints, DANCE maintains robust performance while smoothly adapting architectures to different hardware requirements. The code and appendix can be found at https://github.com/Applied-Machine-Learning-Lab/DANCE.
Cite
Text
Li et al. "LocMoE: A Low-Overhead MoE for Large Language Model Training." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/705Markdown
[Li et al. "LocMoE: A Low-Overhead MoE for Large Language Model Training." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/li2024ijcai-locmoe/) doi:10.24963/ijcai.2024/705BibTeX
@inproceedings{li2024ijcai-locmoe,
title = {{LocMoE: A Low-Overhead MoE for Large Language Model Training}},
author = {Li, Jing and Sun, Zhijie and He, Xuan and Zeng, Li and Lin, Yi and Li, Entong and Zheng, Binfan and Zhao, Rongqian and Chen, Xin},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {6377-6387},
doi = {10.24963/ijcai.2024/705},
url = {https://mlanthology.org/ijcai/2024/li2024ijcai-locmoe/}
}