VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs

Li, Can; Liu, Ying; Zhang, Ting; Wang, Mei; Huang, Hua

VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs

Can Li, Ying Liu, Ting Zhang, Mei Wang, Hua Huang

ICLR 2026

/iclr/2026/li2026iclr-visiomath/

Abstract

Large multimodal models have achieved remarkable progress in integrating vision and language, enabling strong performance across perception, reasoning, and domain-specific tasks. However, their capacity to reason over multiple, visually similar inputs remains insufficiently explored. Such fine-grained comparative reasoning is central to real-world tasks, especially in mathematics and education, where learners must often distinguish between nearly identical diagrams to identify correct solutions. To address this gap, we present VisioMath, a curated benchmark of 1,800 high-quality K–12 mathematics problems in which all candidate answers are diagrams with subtle visual similarities. A comprehensive evaluation of state-of-the-art LMMs, covering both leading closed-source systems and widely adopted open-source models, reveals a consistent decline in accuracy as inter-image similarity increases. Analysis indicates that the dominant failure mode stems from image–text misalignment: rather than grounding reasoning in textual cues, models often resort to shallow positional heuristics, resulting in systematic errors. We further explore three alignment-oriented strategies, spanning training-free approaches and finetuning, and achieve substantial accuracy gains. We hope that VisioMath will serve as a rigorous benchmark and catalyst for developing LMMs toward deeper diagram understanding, precise comparative reasoning, and grounded multi-image–text integration. The code and dataset are available at https://github.com/Nefefilibata/VisioMath.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-visiomath/)

BibTeX

@inproceedings{li2026iclr-visiomath,
  title     = {{VisioMath: Benchmarking Figure-Based Mathematical Reasoning in LMMs}},
  author    = {Li, Can and Liu, Ying and Zhang, Ting and Wang, Mei and Huang, Hua},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-visiomath/}
}