Text2Loc: 3D Point Cloud Localization from Natural Language

Abstract

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network Text2Loc that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition followed by fine localization. In global place recognition relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM) whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover we propose a novel matching-free fine localization method to further refine the location predictions which completely removes the need for complicated text-instance matching and is lighter faster and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2x over the state-of-the-art on the KITTI360Pose dataset. Our project page is publicly available at: https: //yan-xia.github.io/projects/text2loc/.

Cite

Text

Xia et al. "Text2Loc: 3D Point Cloud Localization from Natural Language." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01417

Markdown

[Xia et al. "Text2Loc: 3D Point Cloud Localization from Natural Language." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/xia2024cvpr-text2loc/) doi:10.1109/CVPR52733.2024.01417

BibTeX

@inproceedings{xia2024cvpr-text2loc,
  title     = {{Text2Loc: 3D Point Cloud Localization from Natural Language}},
  author    = {Xia, Yan and Shi, Letian and Ding, Zifeng and Henriques, Joao F. and Cremers, Daniel},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {14958-14967},
  doi       = {10.1109/CVPR52733.2024.01417},
  url       = {https://mlanthology.org/cvpr/2024/xia2024cvpr-text2loc/}
}