VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs
Abstract
Large multimodal models have achieved remarkable progress in integrating vision and language, enabling strong performance across perception, reasoning, and domain-specific tasks. However, their capacity to reason over multiple, visually similar inputs remains insufficiently explored. Such fine-grained comparative reasoning is central to real-world tasks, especially in mathematics and education, where learners must often distinguish between nearly identical diagrams to identify correct solutions. To address this gap, we present VisioMath, a curated benchmark of 1,800 high-quality K–12 mathematics problems in which all candidate answers are diagrams with subtle visual similarities. A comprehensive evaluation of state-of-the-art LMMs, covering both leading closed-source systems and widely adopted open-source models, reveals a consistent decline in accuracy as inter-image similarity increases. Analysis indicates that the dominant failure mode stems from image–text misalignment: rather than grounding reasoning in textual cues, models often resort to shallow positional heuristics, resulting in systematic errors. We further explore three alignment-oriented strategies, spanning training-free approaches and finetuning, and achieve substantial accuracy gains. We hope that VisioMath will serve as a rigorous benchmark and catalyst for developing LMMs toward deeper diagram understanding, precise comparative reasoning, and grounded multi-image–text integration. The code and dataset are available at https://github.com/Nefefilibata/VisioMath.
Cite
Text
Li et al. "VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs." International Conference on Learning Representations, 2026.Markdown
[Li et al. "VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-visiomath/)BibTeX
@inproceedings{li2026iclr-visiomath,
title = {{VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs}},
author = {Li, Can and Liu, Ying and Zhang, Ting and Wang, Mei and Huang, Hua},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/li2026iclr-visiomath/}
}