Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

Chenhang He, Ruihuang Li, Shuai Li, Lei Zhang

CVPR 2022 pp. 8417-8427

doi:10.1109/CVPR52688.2022.00823 /cvpr/2022/he2022cvpr-voxel/

Abstract

Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to apply the self-attention underlying transformer on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention field. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones. VoxSeT reports competitive results on the KITTI and Waymo detection benchmarks. The source code of VoxSeT will be released.

PDF CVPR Semantic Scholar

Cite

Text

He et al. "Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00823

Markdown

[He et al. "Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/he2022cvpr-voxel/) doi:10.1109/CVPR52688.2022.00823

BibTeX

@inproceedings{he2022cvpr-voxel,
  title     = {{Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds}},
  author    = {He, Chenhang and Li, Ruihuang and Li, Shuai and Zhang, Lei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {8417-8427},
  doi       = {10.1109/CVPR52688.2022.00823},
  url       = {https://mlanthology.org/cvpr/2022/he2022cvpr-voxel/}
}