GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation

Feng, Yuan; Yang, Yue; He, Xiaohan; Zhao, Jiatong; Chen, Jianlong; Fu, Daocheng; Liu, Qi; Xia, Renqiu; Zhang, Bo; Yan, Junchi

GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation

Yuan Feng, Yue Yang, Xiaohan He, Jiatong Zhao, Jianlong Chen, Daocheng Fu, Qi Liu, Renqiu Xia, Bo Zhang, Junchi Yan

ICLR 2026

/iclr/2026/feng2026iclr-geobench/

Abstract

Geometric problem solving constitutes a critical branch of mathematical reasoning, requiring precise analysis of shapes and spatial relationships. Current evaluations of geometric reasoning in vision-language models (VLMs) face limitations, including the risk of test data contamination from textbook-based benchmarks, overemphasis on final answers over reasoning processes, and insufficient diagnostic granularity. To address these issues, we present GeoBench, a hierarchical benchmark featuring four reasoning levels in geometric problem-solving: Visual Perception, Goal-Oriented Planning, Rigorous Theorem Application, and Self-Reflective Backtracking. Through six formally verified tasks generated via TrustGeoGen, we systematically assess capabilities ranging from attribute extraction to logical error correction. Experiments reveal that while reasoning models like OpenAI-o3 outperform general MLLMs, performance declines significantly with increasing task complexity. Key findings demonstrate that sub-goal decomposition and irrelevant premise filtering critically influence final problem-solving accuracy, whereas Chain-of-Thought prompting unexpectedly degrades performance in some tasks. These findings establish GeoBench as a comprehensive benchmark while offering actionable guidelines for developing geometric problem-solving systems. Our benchmark and code are released at https://github.com/FrontierX-Lab/GeoBench.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Feng et al. "GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation." International Conference on Learning Representations, 2026.

Markdown

[Feng et al. "GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/feng2026iclr-geobench/)

BibTeX

@inproceedings{feng2026iclr-geobench,
  title     = {{GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation}},
  author    = {Feng, Yuan and Yang, Yue and He, Xiaohan and Zhao, Jiatong and Chen, Jianlong and Fu, Daocheng and Liu, Qi and Xia, Renqiu and Zhang, Bo and Yan, Junchi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/feng2026iclr-geobench/}
}