SCoMoE: Efficient Mixtures of Experts with Structured Communication

Abstract

Mixture-of-Experts (MoE) models are promising architectures for massively multilingual neural machine translation and large language models due to the advantage of sublinear scaling. However, the training of large MoE models is usually bottlenecked by the all-to-all communication (Lepikhin et al., 2020). To reduce the communication cost, we propose SCoMoE, an MoE architecture with structured all-to-all communication, inspired by the hierarchical architecture of the communication topology. SCoMoE encourages data to be communicated across devices through fast intra-accelerator/node communication channels, reducing communication throughput in the slow inter-node communication channel. We slice the data on the sequence dimension (SCoMoE-Seq) into three communication groups and project the data on the feature dimension (SCoMoE-Feat) into low-dimensional representations. To compensate the potential performance drop caused by the routing locality in SCoMoE, we further propose a token clustering approach to aggregating related tokens from different devices before the MoE layers. The sigmoid gating in the balanced router used in the token clustering is substituted with the softmax gating with differential sorting. Experiments on bilingual and massively multilingual machine translation demonstrate that SCoMoE achieves a speedup of 1.44x over GShard with comparable performance, and substantially outperforms Gshard (2.8 BLEU) on OPUS-100 with a speedup of 1.25x.

Cite

Text

Zeng and Xiong. "SCoMoE: Efficient Mixtures of Experts with Structured Communication." International Conference on Learning Representations, 2023.

Markdown

[Zeng and Xiong. "SCoMoE: Efficient Mixtures of Experts with Structured Communication." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/zeng2023iclr-scomoe/)

BibTeX

@inproceedings{zeng2023iclr-scomoe,
  title     = {{SCoMoE: Efficient Mixtures of Experts with Structured Communication}},
  author    = {Zeng, Zhiyuan and Xiong, Deyi},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/zeng2023iclr-scomoe/}
}