PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

Abstract

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

Cite

Text

Tsoukalas et al. "PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition." Neural Information Processing Systems, 2024. doi:10.52202/079017-0368

Markdown

[Tsoukalas et al. "PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/tsoukalas2024neurips-putnambench/) doi:10.52202/079017-0368

BibTeX

@inproceedings{tsoukalas2024neurips-putnambench,
  title     = {{PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition}},
  author    = {Tsoukalas, George and Lee, Jasper and Jennings, John and Xin, Jimmy and Ding, Michelle and Jennings, Michael and Thakur, Amitayush and Chaudhuri, Swarat},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0368},
  url       = {https://mlanthology.org/neurips/2024/tsoukalas2024neurips-putnambench/}
}