AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers

Abstract

3D object detection techniques commonly follow a pipeline that aggregates predicted object central point features to compute candidate points. However, these candidate points contain only positional information, largely ignoring the object-level shape information. This eventually leads to sub-optimal 3D object detection. In this work, we propose AShapeFormer, a semantics-guided object-level shape encoding module for 3D object detection. This is a plug-n-play module that leverages multi-head attention to encode object shape information. We also propose shape tokens and object-scene positional encoding to ensure that the shape information is fully exploited. Moreover, we introduce a semantic guidance sub-module to sample more foreground points and suppress the influence of background points for a better object shape perception. We demonstrate a straightforward enhancement of multiple existing methods with our AShapeFormer. Through extensive experiments on the popular SUN RGB-D and ScanNetV2 dataset, we show that our enhanced models are able to outperform the baselines by a considerable absolute margin of up to 8.1%. Code will be available at https://github.com/ZechuanLi/AShapeFormer

Cite

Text

Li et al. "AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00104

Markdown

[Li et al. "AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/li2023cvpr-ashapeformer/) doi:10.1109/CVPR52729.2023.00104

BibTeX

@inproceedings{li2023cvpr-ashapeformer,
  title     = {{AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers}},
  author    = {Li, Zechuan and Yu, Hongshan and Yang, Zhengeng and Chen, Tongjia and Akhtar, Naveed},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {1012-1021},
  doi       = {10.1109/CVPR52729.2023.00104},
  url       = {https://mlanthology.org/cvpr/2023/li2023cvpr-ashapeformer/}
}