PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving

Abstract

We present PutnamBench, a new multilingual evaluation benchmark for formal theorem-proving. PutnamBench consists of formalizations of problems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problem statements come with formalizations in Lean 4 and Isabelle; a substantial subset have Coq formalizations as well. PutnamBench consists of 1337 hand-written formalizations across the three proof assistants, and aims to benchmark the next generation of theorem-proving algorithms for competition mathematics. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We evaluate several established neural and symbolic theorem provers using PutnamBench. These approaches can only solve a handful of the problems, establishing our benchmark as a difficult open challenge for research on formal theorem-proving. PutnamBench is available at https://github.com/trishullab/PUTNAM.

Cite

Text

Tsoukalas et al. "PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving." ICML 2024 Workshops: AI4MATH, 2024.

Markdown

[Tsoukalas et al. "PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving." ICML 2024 Workshops: AI4MATH, 2024.](https://mlanthology.org/icmlw/2024/tsoukalas2024icmlw-putnambench/)

BibTeX

@inproceedings{tsoukalas2024icmlw-putnambench,
  title     = {{PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving}},
  author    = {Tsoukalas, George and Lee, Jasper and Jennings, John and Xin, Jimmy and Ding, Michelle and Jennings, Michael and Thakur, Amitayush and Chaudhuri, Swarat},
  booktitle = {ICML 2024 Workshops: AI4MATH},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/tsoukalas2024icmlw-putnambench/}
}