NeRF-SOS: Any-View Self-Supervised Object Segmentation on Complex Scenes

Abstract

Neural volumetric representations have shown the potential that Multi-layer Perceptrons (MLPs) can be optimized with multi-view calibrated images to represent scene geometry and appearance without explicit 3D supervision. Object segmentation can enrich many downstream applications based on the learned radiance field. However, introducing hand-crafted segmentation to define regions of interest in a complex real-world scene is non-trivial and expensive as it acquires per view annotation. This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes. Our framework, called NeRF with Self-supervised Object Segmentation (NeRF-SOS), couples object segmentation and neural radiance field to segment objects in any view within a scene. By proposing a novel collaborative contrastive loss in both appearance and geometry levels, NeRF-SOS encourages NeRF models to distill compact geometry-aware segmentation clusters from their density fields and the self-supervised pre-trained 2D visual features. The self-supervised object segmentation framework can be applied to various NeRF models that both lead to photo-realistic rendering results and convincing segmentation maps for both indoor and outdoor scenarios. Extensive results on the LLFF, BlendedMVS, CO3Dv2, and Tank & Temples datasets validate the effectiveness of NeRF-SOS. It consistently surpasses other 2D-based self-supervised baselines and predicts finer object masks than existing supervised counterparts.

Cite

Text

Fan et al. "NeRF-SOS: Any-View Self-Supervised Object Segmentation on Complex Scenes." International Conference on Learning Representations, 2023.

Markdown

[Fan et al. "NeRF-SOS: Any-View Self-Supervised Object Segmentation on Complex Scenes." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/fan2023iclr-nerfsos/)

BibTeX

@inproceedings{fan2023iclr-nerfsos,
  title     = {{NeRF-SOS: Any-View Self-Supervised Object Segmentation on Complex Scenes}},
  author    = {Fan, Zhiwen and Wang, Peihao and Jiang, Yifan and Gong, Xinyu and Xu, Dejia and Wang, Zhangyang},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/fan2023iclr-nerfsos/}
}