CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection

Abstract

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.

Cite

Text

Zhang et al. "CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00098

Markdown

[Zhang et al. "CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhang2022cvpr-catdet/) doi:10.1109/CVPR52688.2022.00098

BibTeX

@inproceedings{zhang2022cvpr-catdet,
  title     = {{CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection}},
  author    = {Zhang, Yanan and Chen, Jiaxin and Huang, Di},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {908-917},
  doi       = {10.1109/CVPR52688.2022.00098},
  url       = {https://mlanthology.org/cvpr/2022/zhang2022cvpr-catdet/}
}