HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics

Jingxuan Fan, Sarah Martinson, Erik Y. Wang, Kaylie Hausknecht, Jonah Brenner, Danxian Liu, Nianli Peng, Corey Wang, Michael Brenner

ICLR 2025

/iclr/2025/fan2025iclr-hardmath/

Abstract

Advanced applied mathematics problems are underrepresented in existing Large Language Model (LLM) benchmark datasets. To address this, we introduce $\textbf{HARDMath}$, a dataset inspired by a graduate course on asymptotic methods, featuring challenging applied mathematics problems that require analytical approximation techniques. These problems demand a combination of mathematical reasoning, computational tools, and subjective judgment, making them difficult for LLMs. Our framework auto-generates a large number of problems with solutions validated against numerical ground truths. We evaluate both open- and closed-source LLMs on $\textbf{HARDMath-mini}$, a sub-sampled test set of 366 problems, as well as on 40 word problems formulated in applied science contexts. Even leading closed-source models like GPT-4 achieve only 43.8% overall accuracy with few-shot Chain-of-Thought prompting, and all models demonstrate significantly lower performance compared to results on existing mathematics benchmark datasets. We additionally conduct a detailed error analysis to gain insights into the failure cases of LLMs. These results demonstrate the limitations of current LLM performance on advanced graduate-level applied math problems and underscore the importance of datasets like $\textbf{HARDMath}$ to advance mathematical abilities of LLMs.

PDF ICLR Semantic Scholar

Cite

Text

Fan et al. "HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics." International Conference on Learning Representations, 2025.

Markdown

[Fan et al. "HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/fan2025iclr-hardmath/)

BibTeX

@inproceedings{fan2025iclr-hardmath,
  title     = {{HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics}},
  author    = {Fan, Jingxuan and Martinson, Sarah and Wang, Erik Y. and Hausknecht, Kaylie and Brenner, Jonah and Liu, Danxian and Peng, Nianli and Wang, Corey and Brenner, Michael},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/fan2025iclr-hardmath/}
}