CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective
Abstract
Although large language models (LLMs) show promise in solving complex mathematical tasks, existing evaluation paradigms rely solely on a coarse measure of overall answer accuracy, which are insufficient for assessing their authentic capabilities. In this paper, we propose CogMath, which comprehensively assesses LLMs’ mathematical abilities through the lens of human cognition. Specifically, inspired by psychological theories, CogMath formalizes human reasoning process into 3 stages: problem comprehension, problem solving, and solution summarization. Within these stages, we investigate perspectives such as numerical calculation, knowledge, and counterfactuals, and design a total of 9 fine-grained evaluation dimensions. In each dimension, we develop an “Inquiry-Judge-Reference” multi-agent system to generate inquiries that assess LLMs’ mastery from this dimension. An LLM is considered to truly master a problem only when excelling in all inquiries from the 9 dimensions. By applying CogMath on three benchmarks, we reveal that the mathematical capabilities of 7 mainstream LLMs are overestimated by 30%-40%. Moreover, we locate their strengths and weaknesses across specific stages/dimensions, offering in-depth insights to further enhance their reasoning abilities.
Cite
Text
Liu et al. "CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Liu et al. "CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/liu2025icml-cogmath/)BibTeX
@inproceedings{liu2025icml-cogmath,
title = {{CogMath: Assessing LLMs’ Authentic Mathematical Ability from a Human Cognitive Perspective}},
author = {Liu, Jiayu and Huang, Zhenya and Dai, Wei and Cheng, Cheng and Wu, Jinze and Sha, Jing and Li, Song and Liu, Qi and Wang, Shijin and Chen, Enhong},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {38692-38707},
volume = {267},
url = {https://mlanthology.org/icml/2025/liu2025icml-cogmath/}
}