Llemma: An Open Language Model for Mathematics

Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Dos Santos, Marco; McAleer, Stephen Marcus; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean

Llemma: An Open Language Model for Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Marcus McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck

ICLR 2024

/iclr/2024/azerbayev2024iclr-llemma/

Abstract

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known openly released models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

PDF ICLR Semantic Scholar

Cite

Text

Azerbayev et al. "Llemma: An Open Language Model for Mathematics." International Conference on Learning Representations, 2024.

Markdown

[Azerbayev et al. "Llemma: An Open Language Model for Mathematics." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/azerbayev2024iclr-llemma/)

BibTeX

@inproceedings{azerbayev2024iclr-llemma,
  title     = {{Llemma: An Open Language Model for Mathematics}},
  author    = {Azerbayev, Zhangir and Schoelkopf, Hailey and Paster, Keiran and Dos Santos, Marco and McAleer, Stephen Marcus and Jiang, Albert Q. and Deng, Jia and Biderman, Stella and Welleck, Sean},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/azerbayev2024iclr-llemma/}
}