BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection

Abstract

In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View~(BEV) 3D object detection. Our work is based on a key observation -- depth estimation in recent approaches is surprisingly inadequate given the fact that depth is essential to camera 3D detection. Our BEVDepth resolves this by leveraging explicit depth supervision. A camera-awareness depth estimation module is also introduced to facilitate the depth predicting capability. Besides, we design a novel Depth Refinement Module to counter the side effects carried by imprecise feature unprojection. Aided by customized Efficient Voxel Pooling and multi-frame mechanism, BEVDepth achieves the new state-of-the-art 60.9% NDS on the challenging nuScenes test set while maintaining high efficiency. For the first time, the NDS score of a camera model reaches 60%. Codes have been released.

Cite

Text

Li et al. "BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I2.25233

Markdown

[Li et al. "BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/li2023aaai-bevdepth/) doi:10.1609/AAAI.V37I2.25233

BibTeX

@inproceedings{li2023aaai-bevdepth,
  title     = {{BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection}},
  author    = {Li, Yinhao and Ge, Zheng and Yu, Guanyi and Yang, Jinrong and Wang, Zengran and Shi, Yukang and Sun, Jianjian and Li, Zeming},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {1477-1485},
  doi       = {10.1609/AAAI.V37I2.25233},
  url       = {https://mlanthology.org/aaai/2023/li2023aaai-bevdepth/}
}