Boosting 3D Object Detection by Simulating Multimodality on Point Clouds

Abstract

This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. We design a novel framework to realize the approach: response distillation to focus on the crucial response samples and avoid the background samples; sparse-voxel distillation to learn voxel semantics and relations from the estimated crucial voxels; a fine-grained voxel-to-point distillation to better attend to features of small and distant objects; and instance distillation to further enhance the deep-feature consistency. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors and even surpasses the baseline LiDAR-image detector on the key NDS metric, filling 72% mAP gap between the single- and multi-modality detectors.

Cite

Text

Zheng et al. "Boosting 3D Object Detection by Simulating Multimodality on Point Clouds." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01327

Markdown

[Zheng et al. "Boosting 3D Object Detection by Simulating Multimodality on Point Clouds." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zheng2022cvpr-boosting/) doi:10.1109/CVPR52688.2022.01327

BibTeX

@inproceedings{zheng2022cvpr-boosting,
  title     = {{Boosting 3D Object Detection by Simulating Multimodality on Point Clouds}},
  author    = {Zheng, Wu and Hong, Mingxuan and Jiang, Li and Fu, Chi-Wing},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {13638-13647},
  doi       = {10.1109/CVPR52688.2022.01327},
  url       = {https://mlanthology.org/cvpr/2022/zheng2022cvpr-boosting/}
}