SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs

Wang, Siting; Pei, Minnan; Sun, Luoyang; Deng, Cheng; Li, Yuchen; Shao, Kun; Tian, Zheng; Zhang, Haifeng; Wang, Jun

SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs

Siting Wang, Minnan Pei, Luoyang Sun, Cheng Deng, Yuchen Li, Kun Shao, Zheng Tian, Haifeng Zhang, Jun Wang

ICLR 2026

/iclr/2026/wang2026iclr-spatialvizbench/

Abstract

Humans can imagine and manipulate visual images mentally, a capability known as \textit{spatial visualization}. While many multi-modal benchmarks assess reasoning on visible visual information, the ability to infer unseen relationships through spatial visualization remains insufficiently evaluated as a spatial skill. This reliance on publicly sourced problems from IQ tests or math competitions risks data contamination and compromises assessment reliability. To this end, we introduce \textbf{\textit{SpatialViz-Bench}}, a comprehensive multi-modal benchmark for \textit{spatial visualization} with \emph{12} tasks across \emph{4} sub-abilities, comprising \emph{1,180} programmatically generated problems, a scalable framework that allows for expansion to ensure fair and continuously reliable evaluations. Our evaluation of \emph{27} Multi-modal Large Language Models (MLLMs) reveals wide performance variations, demonstrates the benchmark's strong discriminative power, and uncovers counter-intuitive findings: Chain-of-Thought (CoT) prompting paradoxically degrades accuracy on open-source models. Through statistical and qualitative analysis of error types, SpatialViz-Bench demonstrates that state-of-the-art MLLMs exhibit deficiencies in \textit{spatial visualization} tasks, thereby addressing a significant lacuna in the field.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-spatialvizbench/)

BibTeX

@inproceedings{wang2026iclr-spatialvizbench,
  title     = {{SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs}},
  author    = {Wang, Siting and Pei, Minnan and Sun, Luoyang and Deng, Cheng and Li, Yuchen and Shao, Kun and Tian, Zheng and Zhang, Haifeng and Wang, Jun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-spatialvizbench/}
}