X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Klingner, Marvin; Borse, Shubhankar; Kumar, Varun Ravi; Rezaei, Behnaz; Narayanan, Venkatraman; Yogamani, Senthil; Porikli, Fatih

doi:10.1109/CVPR52729.2023.01282

X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Marvin Klingner, Shubhankar Borse, Varun Ravi Kumar, Behnaz Rezaei, Venkatraman Narayanan, Senthil Yogamani, Fatih Porikli

CVPR 2023 pp. 13343-13353

doi:10.1109/CVPR52729.2023.01282 /cvpr/2023/klingner2023cvpr-x3kd/

Abstract

Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X3KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the PV feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. Finally, we also employ this teacher for cross-modal output distillation (X-OD), providing dense supervision at the prediction stage. We perform extensive ablations of knowledge distillation at different stages of multi-camera 3DOD. Our final X3KD model outperforms previous state-of-the-art approaches on the nuScenes and Waymo datasets and generalizes to RADAR-based 3DOD. Qualitative results video at https://youtu.be/1do9DPFmr38.

PDF CVPR Semantic Scholar

Cite

Text

Klingner et al. "X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01282

Markdown

[Klingner et al. "X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/klingner2023cvpr-x3kd/) doi:10.1109/CVPR52729.2023.01282

BibTeX

@inproceedings{klingner2023cvpr-x3kd,
  title     = {{X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection}},
  author    = {Klingner, Marvin and Borse, Shubhankar and Kumar, Varun Ravi and Rezaei, Behnaz and Narayanan, Venkatraman and Yogamani, Senthil and Porikli, Fatih},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {13343-13353},
  doi       = {10.1109/CVPR52729.2023.01282},
  url       = {https://mlanthology.org/cvpr/2023/klingner2023cvpr-x3kd/}
}