SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
Abstract
Humans can imagine and manipulate visual images mentally, a capability known as \textit{spatial visualization}. While many multi-modal benchmarks assess reasoning on visible visual information, the ability to infer unseen relationships through spatial visualization remains insufficiently evaluated as a spatial skill. This reliance on publicly sourced problems from IQ tests or math competitions risks data contamination and compromises assessment reliability. To this end, we introduce \textbf{\textit{SpatialViz-Bench}}, a comprehensive multi-modal benchmark for \textit{spatial visualization} with \emph{12} tasks across \emph{4} sub-abilities, comprising \emph{1,180} programmatically generated problems, a scalable framework that allows for expansion to ensure fair and continuously reliable evaluations. Our evaluation of \emph{27} Multi-modal Large Language Models (MLLMs) reveals wide performance variations, demonstrates the benchmark's strong discriminative power, and uncovers counter-intuitive findings: Chain-of-Thought (CoT) prompting paradoxically degrades accuracy on open-source models. Through statistical and qualitative analysis of error types, SpatialViz-Bench demonstrates that state-of-the-art MLLMs exhibit deficiencies in \textit{spatial visualization} tasks, thereby addressing a significant lacuna in the field.
Cite
Text
Wang et al. "SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs." International Conference on Learning Representations, 2026.Markdown
[Wang et al. "SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-spatialvizbench/)BibTeX
@inproceedings{wang2026iclr-spatialvizbench,
title = {{SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs}},
author = {Wang, Siting and Pei, Minnan and Sun, Luoyang and Deng, Cheng and Li, Yuchen and Shao, Kun and Tian, Zheng and Zhang, Haifeng and Wang, Jun},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/wang2026iclr-spatialvizbench/}
}