Where Am I? Cross-View Geo-Localization with Natural Language Descriptions

Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, Weijia Li

ICCV 2025 pp. 5890-5900

/iccv/2025/ye2025iccv-am/

Abstract

Cross-view geo-localization identifies the locations of street-view images by matching them with geo-tagged satellite images or OSM. However, most existing studies focus on image-to-image retrieval, with fewer addressing text-guided retrieval, a task vital for applications like pedestrian navigation and emergency response.In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text descriptions. To support this task, we construct the CVG-Text dataset by collecting cross-view data from multiple cities and employing a scene text generation approach that leverages the annotation capabilities of Large Multimodal Models to produce high-quality scene text descriptions with localization details. Additionally, we propose a novel text-based retrieval localization method, CrossText2Loc, which improves recall by 10% and demonstrates excellent long-text retrieval capabilities. In terms of explainability, it not only provides similarity scores but also offers retrieval reasons. More information can be found at https://github.com/yejy53/CVG-Text

PDF ICCV Semantic Scholar

Cite

Text

Ye et al. "Where Am I? Cross-View Geo-Localization with Natural Language Descriptions." International Conference on Computer Vision, 2025.

Markdown

[Ye et al. "Where Am I? Cross-View Geo-Localization with Natural Language Descriptions." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/ye2025iccv-am/)

BibTeX

@inproceedings{ye2025iccv-am,
  title     = {{Where Am I? Cross-View Geo-Localization with Natural Language Descriptions}},
  author    = {Ye, Junyan and Lin, Honglin and Ou, Leyan and Chen, Dairong and Wang, Zihao and Zhu, Qi and He, Conghui and Li, Weijia},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {5890-5900},
  url       = {https://mlanthology.org/iccv/2025/ye2025iccv-am/}
}