Detect Anything 3D in the Wild

Zhang, Hanxue; Jiang, Haoran; Yao, Qingsong; Sun, Yanan; Zhang, Renrui; Zhao, Hao; Li, Hongyang; Zhu, Hongzi; Yang, Zetong

Detect Anything 3D in the Wild

Hanxue Zhang, Haoran Jiang, Qingsong Yao, Yanan Sun, Renrui Zhang, Hao Zhao, Hongyang Li, Hongzi Zhu, Zetong Yang

ICCV 2025 pp. 5048-5059

/iccv/2025/zhang2025iccv-detect/

Abstract

Despite the success of deep learning in close-set 3D object detection, existing approaches struggle with zero-shot generalization to novel objects and camera configurations. We introduce DetAny3D, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations using only monocular inputs. Training a foundation model for 3D detection is fundamentally constrained by the limited availability of annotated 3D data, which motivates DetAny3D to leverage the rich prior knowledge embedded in extensively pre-trained 2D foundation models to compensate for this scarcity. To effectively transfer 2D knowledge to 3D, DetAny3D incorporates two core modules: the 2D Aggregator, which aligns features from different 2D foundation models, and the 3D Interpreter with Zero-Embedding Mapping, which stabilizes early training in 2D-to-3D knowledge transfer. Experimental results validate the strong generalization of our DetAny3D, which not only achieves state-of-the-art performance on unseen categories and novel camera configurations, but also surpasses most competitors on in-domain data. DetAny3D sheds light on the potential of the 3D foundation model for diverse applications in real-world scenarios, e.g., rare object detection in autonomous driving, and demonstrates promise for further exploration of 3D-centric tasks in open-world settings. More visualization results can be found at our code repository.

PDF ICCV Semantic Scholar

Cite

Text

Zhang et al. "Detect Anything 3D in the Wild." International Conference on Computer Vision, 2025.

Markdown

[Zhang et al. "Detect Anything 3D in the Wild." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhang2025iccv-detect/)

BibTeX

@inproceedings{zhang2025iccv-detect,
  title     = {{Detect Anything 3D in the Wild}},
  author    = {Zhang, Hanxue and Jiang, Haoran and Yao, Qingsong and Sun, Yanan and Zhang, Renrui and Zhao, Hao and Li, Hongyang and Zhu, Hongzi and Yang, Zetong},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {5048-5059},
  url       = {https://mlanthology.org/iccv/2025/zhang2025iccv-detect/}
}