ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Abstract

In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on posed monocular or multi-view RGB images. The number of monocular images in each multi-view input can variate during training and inference; actually, this number might be unique for each multi-view input. ImVoxelNet successfully handles both indoor and outdoor scenes, which makes it general-purpose. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. Moreover, it surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark for multi-view 3D object detection.

Cite

Text

Rukhovich et al. "ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection." Winter Conference on Applications of Computer Vision, 2022.

Markdown

[Rukhovich et al. "ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection." Winter Conference on Applications of Computer Vision, 2022.](https://mlanthology.org/wacv/2022/rukhovich2022wacv-imvoxelnet/)

BibTeX

@inproceedings{rukhovich2022wacv-imvoxelnet,
  title     = {{ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection}},
  author    = {Rukhovich, Danila and Vorontsova, Anna and Konushin, Anton},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2022},
  pages     = {2397-2406},
  url       = {https://mlanthology.org/wacv/2022/rukhovich2022wacv-imvoxelnet/}
}