PanoOcc: Unified Occupancy Representation for Camera-Based 3D Panoptic Segmentation

Abstract

Comprehensive modeling of the surrounding 3D world is crucial for the success of autonomous driving. However existing perception tasks like object detection road structure segmentation depth & elevation estimation and open-set object localization each only focus on a small facet of the holistic 3D scene understanding task. This divide-and-conquer strategy simplifies the algorithm development process but comes at the cost of losing an end-to-end unified solution to the problem. In this work we address this limitation by studying camera-based 3D panoptic segmentation aiming to achieve a unified occupancy representation for camera-only 3D scene understanding. To achieve this we introduce a novel method called PanoOcc which utilizes voxel queries to aggregate spatiotemporal information from multi-frame and multi-view images in a coarse-to-fine scheme integrating feature learning and scene representation into a unified occupancy representation. We have conducted extensive ablation studies to validate the effectiveness and efficiency of the proposed method. Our approach achieves new state-of-the-art results for camera-based semantic segmentation and panoptic segmentation on the nuScenes dataset. Furthermore our method can be easily extended to dense occupancy prediction and has demonstrated promising performance on the Occ3D benchmark. The code will be made available at https://github.com/Robertwyq/PanoOcc.

Cite

Text

Wang et al. "PanoOcc: Unified Occupancy Representation for Camera-Based 3D Panoptic Segmentation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01624

Markdown

[Wang et al. "PanoOcc: Unified Occupancy Representation for Camera-Based 3D Panoptic Segmentation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wang2024cvpr-panoocc/) doi:10.1109/CVPR52733.2024.01624

BibTeX

@inproceedings{wang2024cvpr-panoocc,
  title     = {{PanoOcc: Unified Occupancy Representation for Camera-Based 3D Panoptic Segmentation}},
  author    = {Wang, Yuqi and Chen, Yuntao and Liao, Xingyu and Fan, Lue and Zhang, Zhaoxiang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {17158-17168},
  doi       = {10.1109/CVPR52733.2024.01624},
  url       = {https://mlanthology.org/cvpr/2024/wang2024cvpr-panoocc/}
}