VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception

Abstract

Recent advancements in 3D perception have led to a proliferation of network architectures, particularly those involving multi-modal fusion algorithms. While these fusion algorithms improve accuracy, their complexity often impedes real-time performance. This paper introduces VeXKD, an effective and Versatile framework that integrates Cross-Modal Fusion with Knowledge Distillation. VeXKD applies knowledge distillation exclusively to the Bird's Eye View (BEV) feature maps, enabling the transfer of cross-modal insights to single-modal students without additional inference time overhead. It avoids volatile components that can vary across various 3D perception tasks and student modalities, thus improving versatility. The framework adopts a modality-general cross-modal fusion module to bridge the modality gap between the multi-modal teachers and single-modal students. Furthermore, leveraging byproducts generated during fusion, our BEV query guided mask generation network identifies crucial spatial locations across different BEV feature maps in a data-driven manner, significantly enhancing the effectiveness of knowledge distillation. Extensive experiments on the nuScenes dataset demonstrate notable improvements, with up to 6.9\%/4.2\% increase in mAP and NDS for 3D detection tasks and up to 4.3\% rise in mIoU for BEV map segmentation tasks, narrowing the performance gap with multi-modal models.

Cite

Text

Ji et al. "VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception." Neural Information Processing Systems, 2024. doi:10.52202/079017-3991

Markdown

[Ji et al. "VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ji2024neurips-vexkd/) doi:10.52202/079017-3991

BibTeX

@inproceedings{ji2024neurips-vexkd,
  title     = {{VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception}},
  author    = {Ji, Yuzhe and Chen, Yijie and Yang, Liuqing and Ding, Rui and Yang, Meng and Zheng, Xinhu},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3991},
  url       = {https://mlanthology.org/neurips/2024/ji2024neurips-vexkd/}
}