OLMoE: Open Mixture-of-Experts Language Models
Abstract
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present novel findings on MoE training, define and analyze new routing properties showing high specialization in our model, and open-source all our work: model weights, training data, code, and logs.
Cite
Text
Muennighoff et al. "OLMoE: Open Mixture-of-Experts Language Models." International Conference on Learning Representations, 2025.Markdown
[Muennighoff et al. "OLMoE: Open Mixture-of-Experts Language Models." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/muennighoff2025iclr-olmoe/)BibTeX
@inproceedings{muennighoff2025iclr-olmoe,
title = {{OLMoE: Open Mixture-of-Experts Language Models}},
author = {Muennighoff, Niklas and Soldaini, Luca and Groeneveld, Dirk and Lo, Kyle and Morrison, Jacob and Min, Sewon and Shi, Weijia and Walsh, Evan Pete and Tafjord, Oyvind and Lambert, Nathan and Gu, Yuling and Arora, Shane and Bhagia, Akshita and Schwenk, Dustin and Wadden, David and Wettig, Alexander and Hui, Binyuan and Dettmers, Tim and Kiela, Douwe and Farhadi, Ali and Smith, Noah A. and Koh, Pang Wei and Singh, Amanpreet and Hajishirzi, Hannaneh},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/muennighoff2025iclr-olmoe/}
}