O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation

Abstract

Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive mapping. However, implementing open-vocabulary scene understanding capability into online neural implicit mapping still faces three challenges: lack of local scene updating ability, blurry spatial hierarchical semantic segmentation and difficulty in maintaining multi-view consistency. To this end, we proposed O2V-mapping, which utilizes voxel-based language and geometric features to create an open-vocabulary field, thus allowing for local updates during online training process. Additionally, we leverage a foundational model for image segmentation to extract language features on object-level entities, achieving clear segmentation boundaries and hierarchical semantic features. For the purpose of preserving consistency in 3D object properties across different viewpoints, we propose a spatial adaptive voxel adjustment mechanism and a multi-view weight selection method. Extensive experiments on open-vocabulary object localization and semantic segmentation demonstrate that O2V-mapping achieves online construction of language scenes while enhancing accuracy, outperforming the previous SOTA method.

Cite

Text

Tie et al. "O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73021-4_19

Markdown

[Tie et al. "O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/tie2024eccv-o2vmapping/) doi:10.1007/978-3-031-73021-4_19

BibTeX

@inproceedings{tie2024eccv-o2vmapping,
  title     = {{O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation}},
  author    = {Tie, Muer and Wei, Julong and Wang, Zhengjun and Wu, Ke and Yuan, Shanshuai and Zhang, Kaizhao and Jia, Jie and Zhao, Jieru and Gan, Zhongxue and Ding, Wenchao},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73021-4_19},
  url       = {https://mlanthology.org/eccv/2024/tie2024eccv-o2vmapping/}
}