Pursuing Minimal Sufficiency in Spatial Reasoning
Abstract
Spatial reasoning, the ability to ground language in 3D understanding, remains a persistent challenge for Vision-Language Models (VLMs). We identify two fundamental bottlenecks: \textit{inadequate} 3D understanding capabilities stemming from 2D-centric pre-training, and reasoning failures induced by \textit{redundant} 3D information. To address these, we first construct a Minimal Sufficient Set (MSS) of information before answering a given question: a \textit{compact} selection of 3D perception results from \textit{expert models}. We introduce \textbf{MSSR} (Minimal Sufficient Spatial Reasoner), a dual-agent framework that implements this principle. A \textit{Perception Agent} programmatically queries 3D scenes using a versatile perception toolbox to extract sufficient information, including a novel \textbf{SOG} (Situated Orientation Grounding) module that robustly extracts language-grounded directions. A \textit{Reasoning Agent} then iteratively refines this information to pursue minimality, pruning redundant details and requesting missing ones in a closed loop until the MSS is curated. Extensive experiments demonstrate that our method, by explicitly pursuing both sufficiency and minimality, significantly improves accuracy and achieves state-of-the-art performance across two challenging benchmarks. Furthermore, our framework produces interpretable reasoning paths, offering a promising source of high-quality training data for future models. Source code will be made publicly available.
Cite
Text
Guo et al. "Pursuing Minimal Sufficiency in Spatial Reasoning." International Conference on Learning Representations, 2026.Markdown
[Guo et al. "Pursuing Minimal Sufficiency in Spatial Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/guo2026iclr-pursuing/)BibTeX
@inproceedings{guo2026iclr-pursuing,
title = {{Pursuing Minimal Sufficiency in Spatial Reasoning}},
author = {Guo, Yejie and Hou, Yunzhong and Ma, Wufei and Tang, Meng and Yang, Ming-Hsuan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/guo2026iclr-pursuing/}
}