SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection

Abstract

Existing point-cloud based 3D object detectors use convolution-like operators to process information in a local neighbourhood with fixed-weight kernels and aggregate global context hierarchically. However, non-local neural networks and self-attention for 2D vision have shown that explicitly modeling long-range interactions can lead to more robust and competitive models. In this paper, we propose two variants of self-attention for contextual modeling in 3D object detection by augmenting convolutional features with self-attention features. We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors and show consistent improvement over strong baseline models of up to 1.5 3D AP while simultaneously reducing their parameter footprint and computational cost by 15-80% and 30-50%, respectively, on the KITTI validation set. We next propose a self-attention variant that samples a sub-set of the most representative features by learning deformations over randomly sampled locations. This not only allows us to scale explicit global contextual modeling to larger point-clouds, but also leads to more discriminative and informative feature descriptors. Our method can be flexibly applied to most state-of-the-art detectors with in-creased accuracy and parameter and compute efficiency. We show our proposed method improves 3D object detection performance on KITTI, nuScenes and Waymo Open datasets. Code is available at https://github.com/AutoVision-cloud/SA-Det3D.

Cite

Text

Bhattacharyya et al. "SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00337

Markdown

[Bhattacharyya et al. "SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/bhattacharyya2021iccvw-sadet3d/) doi:10.1109/ICCVW54120.2021.00337

BibTeX

@inproceedings{bhattacharyya2021iccvw-sadet3d,
  title     = {{SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection}},
  author    = {Bhattacharyya, Prarthana and Huang, Chengjie and Czarnecki, Krzysztof},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {3022-3031},
  doi       = {10.1109/ICCVW54120.2021.00337},
  url       = {https://mlanthology.org/iccvw/2021/bhattacharyya2021iccvw-sadet3d/}
}