Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

Abstract

Language-based localization is a crucial task in robotics and computer vision, enabling robots to understand spatial positions through language. Recent methods rely on contrastive learning to establish correspondences between global features of texts and point clouds. However, the inherent ambiguity of textual descriptions makes it difficult to convey geometric information accurately, forcing alignment of them in the feature space may compromise the expressiveness of the point clouds. Unlike previous methods, this paper proposes using language as a filter to distinguish dissimilar locations. To this end, we propose a robust framework of multi-level negative contrastive learning for language-based localization, fully leveraging the descriptive power of language for spatial localization. Our method learns multiple mismatched factors by minimizing the similarity of different locations at different levels, including global-level, instance-level and relationlevel, respectively. Extensive experiments conducted on the KITTI360Pose benchmark demonstrate that our method outperforms better that the state-of-the-art methods. Specifically, we achieve a 56.3% improvement in Top-1 retrieval recall and a 45.9% improvement in 5m localization recall.

Cite

Text

Liu et al. "Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I5.32574

Markdown

[Liu et al. "Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/liu2025aaai-text/) doi:10.1609/AAAI.V39I5.32574

BibTeX

@inproceedings{liu2025aaai-text,
  title     = {{Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning}},
  author    = {Liu, Dunqiang and Huang, Shujun and Li, Wen and Shen, Siqi and Wang, Cheng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {5397-5405},
  doi       = {10.1609/AAAI.V39I5.32574},
  url       = {https://mlanthology.org/aaai/2025/liu2025aaai-text/}
}