OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation

Wang, Zhenyu; Li, Ya-Li; Liu, Taichi; Zhao, Hengshuang; Wang, Shengjin

doi:10.1007/978-3-031-72970-6_5

OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation

Zhenyu Wang, Ya-Li Li, Taichi Liu, Hengshuang Zhao, Shengjin Wang

ECCV 2024

doi:10.1007/978-3-031-72970-6_5 /eccv/2024/wang2024eccv-ovuni3detr/

Abstract

In the current state of 3D object detection research, the severe scarcity of annotated 3D data, substantial disparities across different data modalities, and the absence of a unified architecture, have impeded the progress towards the goal of universality. In this paper, we propose OV-Uni3DETR, a unified open-vocabulary 3D detector via cycle-modality propagation. Compared with existing 3D detectors, OV-Uni3DETR offers distinct advantages: 1) Open-vocabulary 3D detection: During training, it leverages various accessible data, especially extensive 2D detection images, to boost training diversity. During inference, it can detect both seen and unseen classes. 2) Modality unifying: It seamlessly accommodates input data from any given modality, effectively addressing scenarios involving disparate modalities or missing sensor information, thereby supporting test-time modality switching. 3) Scene unifying: It provides a unified multi-modal model architecture for diverse scenes collected by distinct sensors. Specifically, we propose the cycle-modality propagation, aimed at propagating knowledge bridging 2D and 3D modalities, to support the aforementioned functionalities. 2D semantic knowledge from large-vocabulary learning guides novel class discovery in the 3D domain, and 3D geometric knowledge provides localization supervision for 2D detection images. OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average. Its performance using only RGB images is on par with or even surpasses that of previous point cloud based methods. Code is available at https://github. com/zhenyuw16/Uni3DETR.

PDF ECCV Semantic Scholar

Cite

Text

Wang et al. "OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72970-6_5

Markdown

[Wang et al. "OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wang2024eccv-ovuni3detr/) doi:10.1007/978-3-031-72970-6_5

BibTeX

@inproceedings{wang2024eccv-ovuni3detr,
  title     = {{OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation}},
  author    = {Wang, Zhenyu and Li, Ya-Li and Liu, Taichi and Zhao, Hengshuang and Wang, Shengjin},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72970-6_5},
  url       = {https://mlanthology.org/eccv/2024/wang2024eccv-ovuni3detr/}
}