Pursuing Minimal Sufficiency in Spatial Reasoning

Abstract

Spatial reasoning, the ability to ground language in 3D understanding, remains a persistent challenge for Vision-Language Models (VLMs). We identify two fundamental bottlenecks: \textit{inadequate} 3D understanding capabilities stemming from 2D-centric pre-training, and reasoning failures induced by \textit{redundant} 3D information. To address these, we first construct a Minimal Sufficient Set (MSS) of information before answering a given question: a \textit{compact} selection of 3D perception results from \textit{expert models}. We introduce \textbf{MSSR} (Minimal Sufficient Spatial Reasoner), a dual-agent framework that implements this principle. A \textit{Perception Agent} programmatically queries 3D scenes using a versatile perception toolbox to extract sufficient information, including a novel \textbf{SOG} (Situated Orientation Grounding) module that robustly extracts language-grounded directions. A \textit{Reasoning Agent} then iteratively refines this information to pursue minimality, pruning redundant details and requesting missing ones in a closed loop until the MSS is curated. Extensive experiments demonstrate that our method, by explicitly pursuing both sufficiency and minimality, significantly improves accuracy and achieves state-of-the-art performance across two challenging benchmarks. Furthermore, our framework produces interpretable reasoning paths, offering a promising source of high-quality training data for future models. Source code will be made publicly available.

Cite

Text

Guo et al. "Pursuing Minimal Sufficiency in Spatial Reasoning." International Conference on Learning Representations, 2026.

Markdown

[Guo et al. "Pursuing Minimal Sufficiency in Spatial Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/guo2026iclr-pursuing/)

BibTeX

@inproceedings{guo2026iclr-pursuing,
  title     = {{Pursuing Minimal Sufficiency in Spatial Reasoning}},
  author    = {Guo, Yejie and Hou, Yunzhong and Ma, Wufei and Tang, Meng and Yang, Ming-Hsuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/guo2026iclr-pursuing/}
}