DeepInteraction: 3D Object Detection via Modality Interaction
Abstract
Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.
Cite
Text
Yang et al. "DeepInteraction: 3D Object Detection via Modality Interaction." Neural Information Processing Systems, 2022.Markdown
[Yang et al. "DeepInteraction: 3D Object Detection via Modality Interaction." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/yang2022neurips-deepinteraction/)BibTeX
@inproceedings{yang2022neurips-deepinteraction,
title = {{DeepInteraction: 3D Object Detection via Modality Interaction}},
author = {Yang, Zeyu and Chen, Jiaqi and Miao, Zhenwei and Li, Wei and Zhu, Xiatian and Zhang, Li},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/yang2022neurips-deepinteraction/}
}