CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective

Liu, Jiayu; Huang, Zhenya; Dai, Wei; Cheng, Cheng; Wu, Jinze; Sha, Jing; Li, Song; Liu, Qi; Wang, Shijin; Chen, Enhong

CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective

Jiayu Liu, Zhenya Huang, Wei Dai, Cheng Cheng, Jinze Wu, Jing Sha, Song Li, Qi Liu, Shijin Wang, Enhong Chen

ICML 2025 pp. 38692-38707

/icml/2025/liu2025icml-cogmath/

Abstract

Although large language models (LLMs) show promise in solving complex mathematical tasks, existing evaluation paradigms rely solely on a coarse measure of overall answer accuracy, which are insufficient for assessing their authentic capabilities. In this paper, we propose CogMath, which comprehensively assesses LLMs’ mathematical abilities through the lens of human cognition. Specifically, inspired by psychological theories, CogMath formalizes human reasoning process into 3 stages: problem comprehension, problem solving, and solution summarization. Within these stages, we investigate perspectives such as numerical calculation, knowledge, and counterfactuals, and design a total of 9 fine-grained evaluation dimensions. In each dimension, we develop an “Inquiry-Judge-Reference” multi-agent system to generate inquiries that assess LLMs’ mastery from this dimension. An LLM is considered to truly master a problem only when excelling in all inquiries from the 9 dimensions. By applying CogMath on three benchmarks, we reveal that the mathematical capabilities of 7 mainstream LLMs are overestimated by 30%-40%. Moreover, we locate their strengths and weaknesses across specific stages/dimensions, offering in-depth insights to further enhance their reasoning abilities.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Liu et al. "CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Liu et al. "CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/liu2025icml-cogmath/)

BibTeX

@inproceedings{liu2025icml-cogmath,
  title     = {{CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective}},
  author    = {Liu, Jiayu and Huang, Zhenya and Dai, Wei and Cheng, Cheng and Wu, Jinze and Sha, Jing and Li, Song and Liu, Qi and Wang, Shijin and Chen, Enhong},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {38692-38707},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/liu2025icml-cogmath/}
}