PanSt3R: Multi-View Consistent Panoptic Segmentation

Abstract

Panoptic segmentation in 3D is a fundamental problem in scene understanding. Existing approaches typically rely on costly test-time optimizations (often based on NeRF) to consolidate 2D predictions of off-the-shelf panoptic segmentation methods into 3D. Instead, in this work, we propose a unified and integrated approach PanSt3R, which eliminates the need for test-time optimization by jointly predicting 3D geometry and multi-view-consistent panoptic segmentation in a single forward pass. Our approach harnesses the 3D representations of MUSt3R, a recent scalable multi-view version of DUSt3R, and 2D representations of DINOv2, then performs joint multi-view panoptic prediction via a mask transformer architecture. We additionally revisit the standard post-processing mask merging procedure and introduce a more principled approach for multi-view segmentation. We also introduce a simple method for generating novel-view predictions based on the predictions of PanSt3R and vanilla 3DGS. Overall, the proposed PanSt3R is conceptually simple yet fast and scalable, and achieves state-of-the-art performance on several benchmarks, while being orders of magnitude faster.

Cite

Text

Zust et al. "PanSt3R: Multi-View Consistent Panoptic Segmentation." International Conference on Computer Vision, 2025.

Markdown

[Zust et al. "PanSt3R: Multi-View Consistent Panoptic Segmentation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zust2025iccv-panst3r/)

BibTeX

@inproceedings{zust2025iccv-panst3r,
  title     = {{PanSt3R: Multi-View Consistent Panoptic Segmentation}},
  author    = {Zust, Lojze and Cabon, Yohann and Marrie, Juliette and Antsfeld, Leonid and Chidlovskii, Boris and Revaud, Jerome and Csurka, Gabriela},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {5856-5866},
  url       = {https://mlanthology.org/iccv/2025/zust2025iccv-panst3r/}
}