OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Abstract

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the model, leading to enhanced accuracy. For example, with a budget of around one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2X fewer pre-training tokens. Our source code along with pre-trained model weights and training recipes is available at https://github.com/apple/corenet. OpenELM HuggingFace models can be found at: https://huggingface.co/apple/OpenELM.

Cite

Text

Mehta et al. "OpenELM: An Efficient Language Model Family with Open Training and Inference Framework." ICML 2024 Workshops: ES-FoMo-II, 2024.

Markdown

[Mehta et al. "OpenELM: An Efficient Language Model Family with Open Training and Inference Framework." ICML 2024 Workshops: ES-FoMo-II, 2024.](https://mlanthology.org/icmlw/2024/mehta2024icmlw-openelm/)

BibTeX

@inproceedings{mehta2024icmlw-openelm,
  title     = {{OpenELM: An Efficient Language Model Family with Open Training and Inference Framework}},
  author    = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Seyed Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
  booktitle = {ICML 2024 Workshops: ES-FoMo-II},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/mehta2024icmlw-openelm/}
}