Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction

Abstract

Performing holistic 3D scene understanding from a single-view observation, involving generating instance shapes and 3D scene segmentation, is a long-standing challenge. Prevailing works either focus only on geometry or segmentation, or model the task in two folds by separate modules, whose results are merged later to form the final prediction. Inspired by recent advances in 2D vision that unify image segmentation and detection by Transformer-based models, we present Uni-3D, a holistic 3D scene parsing/reconstruction system for a single RGB image. Uni-3D features a universal model with query-based representations for predicting segments of both object instances and scene layout. In Uni-3D, we also introduce a single Transformer for 2D depth-aware panoptic segmentation, which offers queries that serve as strong shape priors in 3D. Uni-3D seamlessly integrates 2D and 3D in its architecture and it outperforms previous methods significantly.

Cite

Text

Zhang et al. "Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00849

Markdown

[Zhang et al. "Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/zhang2023iccv-uni3d/) doi:10.1109/ICCV51070.2023.00849

BibTeX

@inproceedings{zhang2023iccv-uni3d,
  title     = {{Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction}},
  author    = {Zhang, Xiang and Chen, Zeyuan and Wei, Fangyin and Tu, Zhuowen},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {9256-9266},
  doi       = {10.1109/ICCV51070.2023.00849},
  url       = {https://mlanthology.org/iccv/2023/zhang2023iccv-uni3d/}
}