SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-Shot 3D Visual Grounding

Abstract

3D Visual Grounding (3DVG) aims to localize target objects within a 3D scene based on natural language queries. To alleviate the reliance on costly 3D training data, recent studies have explored zero-shot 3DVG by leveraging the extensive knowledge and powerful reasoning capabilities of pre-trained LLMs and VLMs. However, existing paradigms tend to emphasize either spatial (3D-based) or semantic (2D-based) understanding, limiting their effectiveness in complex real-world applications. In this work, we introduce SPAZER — a VLM-driven agent that combines both modalities in a progressive reasoning framework. It first holistically analyzes the scene and produces a 3D rendering from the optimal viewpoint. Based on this, anchor-guided candidate screening is conducted to perform a coarse-level localization of potential objects. Furthermore, leveraging retrieved relevant 2D camera images, 3D-2D joint decision-making is efficiently performed to determine the best-matching object. By bridging spatial and semantic reasoning neural streams, SPAZER achieves robust zero-shot grounding without training on 3D-labeled data. Extensive experiments on ScanRefer and Nr3D benchmarks demonstrate that SPAZER significantly outperforms previous state-of-the-art zero-shot methods, achieving notable gains of $\mathbf{9.0\}$% and $\mathbf{10.9\}$% in accuracy.

Cite

Text

Jin et al. "SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-Shot 3D Visual Grounding." Advances in Neural Information Processing Systems, 2025.

Markdown

[Jin et al. "SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-Shot 3D Visual Grounding." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/jin2025neurips-spazer/)

BibTeX

@inproceedings{jin2025neurips-spazer,
  title     = {{SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-Shot 3D Visual Grounding}},
  author    = {Jin, Zhao and Tu, Rong-Cheng and Liao, Jingyi and Sun, Wenhao and Luo, Xiao and Liu, Shunyu and Tao, Dacheng},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/jin2025neurips-spazer/}
}