PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
Abstract
We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.
Cite
Text
Tsoukalas et al. "PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition." Neural Information Processing Systems, 2024. doi:10.52202/079017-0368Markdown
[Tsoukalas et al. "PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/tsoukalas2024neurips-putnambench/) doi:10.52202/079017-0368BibTeX
@inproceedings{tsoukalas2024neurips-putnambench,
title = {{PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition}},
author = {Tsoukalas, George and Lee, Jasper and Jennings, John and Xin, Jimmy and Ding, Michelle and Jennings, Michael and Thakur, Amitayush and Chaudhuri, Swarat},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-0368},
url = {https://mlanthology.org/neurips/2024/tsoukalas2024neurips-putnambench/}
}