Multi-Hypothesis Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity via Laplacian Visual Prompting
Abstract
Depth ambiguity is a fundamental challenge in spatial scene understanding, especially in transparent scenes where single-depth estimates fail to capture full 3D structure. Existing models, limited to deterministic predictions, overlook real-world multi-layer depth. To address this, we introduce a paradigm shift from single-prediction to multi-hypothesis spatial foundation models. We first present MD-3k, a benchmark exposing depth biases in expert and foundational models through multi-layer spatial relationship labels and new metrics. To resolve depth ambiguity, we propose Laplacian Visual Prompting (LVP), a training-free spectral prompting technique that extracts hidden depth from pre-trained models via Laplacian-transformed RGB inputs. By integrating LVP-inferred depth with standard RGB-based estimates, our approach elicits multi-layer depth without model retraining. Extensive experiments validate the effectiveness of LVP in zero-shot multi-layer depth estimation, unlocking more robust and comprehensive geometry-conditioned visual generation, 3D-grounded spatial reasoning, and temporally consistent video-level depth inference. Our benchmark and code will be available at https://github.com/Xiaohao-Xu/Ambiguity-in-Space.
Cite
Text
Xu et al. "Multi-Hypothesis Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity via Laplacian Visual Prompting." ICLR 2025 Workshops: FM-Wild, 2025.Markdown
[Xu et al. "Multi-Hypothesis Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity via Laplacian Visual Prompting." ICLR 2025 Workshops: FM-Wild, 2025.](https://mlanthology.org/iclrw/2025/xu2025iclrw-multihypothesis/)BibTeX
@inproceedings{xu2025iclrw-multihypothesis,
title = {{Multi-Hypothesis Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity via Laplacian Visual Prompting}},
author = {Xu, Xiaohao and Xue, Feng and Li, Xiang and Li, Haowei and Yang, Shusheng and Zhang, Tianyi and Johnson-Roberson, Matthew and Huang, Xiaonan},
booktitle = {ICLR 2025 Workshops: FM-Wild},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/xu2025iclrw-multihypothesis/}
}