Boosting 3D Object Detection by Simulating Multimodality on Point Clouds
Abstract
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. We design a novel framework to realize the approach: response distillation to focus on the crucial response samples and avoid the background samples; sparse-voxel distillation to learn voxel semantics and relations from the estimated crucial voxels; a fine-grained voxel-to-point distillation to better attend to features of small and distant objects; and instance distillation to further enhance the deep-feature consistency. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors and even surpasses the baseline LiDAR-image detector on the key NDS metric, filling 72% mAP gap between the single- and multi-modality detectors.
Cite
Text
Zheng et al. "Boosting 3D Object Detection by Simulating Multimodality on Point Clouds." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01327Markdown
[Zheng et al. "Boosting 3D Object Detection by Simulating Multimodality on Point Clouds." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zheng2022cvpr-boosting/) doi:10.1109/CVPR52688.2022.01327BibTeX
@inproceedings{zheng2022cvpr-boosting,
title = {{Boosting 3D Object Detection by Simulating Multimodality on Point Clouds}},
author = {Zheng, Wu and Hong, Mingxuan and Jiang, Li and Fu, Chi-Wing},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {13638-13647},
doi = {10.1109/CVPR52688.2022.01327},
url = {https://mlanthology.org/cvpr/2022/zheng2022cvpr-boosting/}
}