PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Abstract

In this paper, we introduce **PolyMath**, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. We conduct a comprehensive evaluation for advanced LLMs and find that even Qwen-3-235B-A22B-Thinking and Gemini-2.5-pro, achieve only 54.6 and 52.2 benchmark scores, with about 40% accuracy under the highest level. From a language perspective, our benchmark reveals several key challenges of LLMs in multilingual reasoning: (1) Reasoning performance varies widely across languages for current LLMs; (2) Input-output language consistency is low in reasoning LLMs and may be correlated with performance; (3) The thinking length differs significantly by language for current LLMs. Additionally, we demonstrate that controlling the output language in the instructions has the potential to affect reasoning performance, especially for some low-resource languages, suggesting a promising direction for improving multilingual capabilities in LLMs.

Cite

Text

Wang et al. "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts." Advances in Neural Information Processing Systems, 2025.

Markdown

[Wang et al. "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wang2025neurips-polymath/)

BibTeX

@inproceedings{wang2025neurips-polymath,
  title     = {{PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts}},
  author    = {Wang, Yiming and Zhang, Pei and Tang, Jialong and Wei, Hao-Ran and Yang, Baosong and Wang, Rui and Sun, Chenshu and Sun, Feitong and Zhang, Jiran and Wu, Junxuan and Cang, Qiqian and Zhang, Yichang and Huang, Fei and Lin, Junyang and Huang, Fei and Zhou, Jingren},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/wang2025neurips-polymath/}
}