Text2Loc: 3D Point Cloud Localization from Natural Language
Abstract
We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network Text2Loc that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition followed by fine localization. In global place recognition relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM) whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover we propose a novel matching-free fine localization method to further refine the location predictions which completely removes the need for complicated text-instance matching and is lighter faster and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to 2x over the state-of-the-art on the KITTI360Pose dataset. Our project page is publicly available at: https: //yan-xia.github.io/projects/text2loc/.
Cite
Text
Xia et al. "Text2Loc: 3D Point Cloud Localization from Natural Language." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01417Markdown
[Xia et al. "Text2Loc: 3D Point Cloud Localization from Natural Language." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/xia2024cvpr-text2loc/) doi:10.1109/CVPR52733.2024.01417BibTeX
@inproceedings{xia2024cvpr-text2loc,
title = {{Text2Loc: 3D Point Cloud Localization from Natural Language}},
author = {Xia, Yan and Shi, Letian and Ding, Zifeng and Henriques, Joao F. and Cremers, Daniel},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {14958-14967},
doi = {10.1109/CVPR52733.2024.01417},
url = {https://mlanthology.org/cvpr/2024/xia2024cvpr-text2loc/}
}