STXD: Structural and Temporal Cross-Modal Distillation for Multi-View 3D Object Detection

Abstract

3D object detection (3DOD) from multi-view images is an economically appealing alternative to expensive LiDAR-based detectors, but also an extremely challenging task due to the absence of precise spatial cues. Recent studies have leveraged the teacher-student paradigm for cross-modal distillation, where a strong LiDAR-modality teacher transfers useful knowledge to a multi-view-based image-modality student. However, prior approaches have only focused on minimizing global distances between cross-modal features, which may lead to suboptimal knowledge distillation results. Based on these insights, we propose a novel structural and temporal cross-modal knowledge distillation (STXD) framework for multi-view 3DOD. First, STXD reduces redundancy of the feature components of the student by regularizing the cross-correlation of cross-modal features, while maximizing their similarities. Second, to effectively transfer temporal knowledge, STXD encodes temporal relations of features across a sequence of frames via similarity maps. Lastly, STXD also adopts a response distillation method to further enhance the quality of knowledge distillation at the output-level. Our extensive experiments demonstrate that STXD significantly improves the NDS and mAP of the based student detectors by 2.8%~4.5% on the nuScenes testing dataset.

Cite

Text

Jang et al. "STXD: Structural and Temporal Cross-Modal Distillation for Multi-View 3D Object Detection." Neural Information Processing Systems, 2023.

Markdown

[Jang et al. "STXD: Structural and Temporal Cross-Modal Distillation for Multi-View 3D Object Detection." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/jang2023neurips-stxd/)

BibTeX

@inproceedings{jang2023neurips-stxd,
  title     = {{STXD: Structural and Temporal Cross-Modal Distillation for Multi-View 3D Object Detection}},
  author    = {Jang, Sujin and Jo, Dae Ung and Hwang, Sung Ju and Lee, Dongwook and Ji, Daehyun},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/jang2023neurips-stxd/}
}