Voxel Field Fusion for 3D Object Detection

Abstract

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. The proposed approach aims to maintain cross-modality consistency by representing and fusing augmented image features as a ray in the voxel field. To this end, the learnable sampler is first designed to sample vital features from the image plane that are projected to the voxel grid in a point-to-ray manner, which maintains the consistency in feature representation with spatial context. In addition, ray-wise fusion is conducted to fuse features with the supplemental context in the constructed voxel field. We further develop mixed augmentor to align feature-variant transformations, which bridges the modality gap in data augmentation. The proposed framework is demonstrated to achieve consistent gains in various benchmarks and outperforms previous fusion-based methods on KITTI and nuScenes datasets. Code is made available at https://github.com/dvlab-research/VFF.

Cite

Text

Li et al. "Voxel Field Fusion for 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00119

Markdown

[Li et al. "Voxel Field Fusion for 3D Object Detection." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/li2022cvpr-voxel/) doi:10.1109/CVPR52688.2022.00119

BibTeX

@inproceedings{li2022cvpr-voxel,
  title     = {{Voxel Field Fusion for 3D Object Detection}},
  author    = {Li, Yanwei and Qi, Xiaojuan and Chen, Yukang and Wang, Liwei and Li, Zeming and Sun, Jian and Jia, Jiaya},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {1120-1129},
  doi       = {10.1109/CVPR52688.2022.00119},
  url       = {https://mlanthology.org/cvpr/2022/li2022cvpr-voxel/}
}