Is Pseudo-LiDAR Needed for Monocular 3D Object Detection?

Abstract

Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-theart results on two challenging benchmarks, with 16.34% and 9.28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.

Cite

Text

Park et al. "Is Pseudo-LiDAR Needed for Monocular 3D Object Detection?." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00313

Markdown

[Park et al. "Is Pseudo-LiDAR Needed for Monocular 3D Object Detection?." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/park2021iccv-pseudolidar/) doi:10.1109/ICCV48922.2021.00313

BibTeX

@inproceedings{park2021iccv-pseudolidar,
  title     = {{Is Pseudo-LiDAR Needed for Monocular 3D Object Detection?}},
  author    = {Park, Dennis and Ambrus, Rares and Guizilini, Vitor and Li, Jie and Gaidon, Adrien},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {3142-3152},
  doi       = {10.1109/ICCV48922.2021.00313},
  url       = {https://mlanthology.org/iccv/2021/park2021iccv-pseudolidar/}
}