GeoReasoner: Geo-Localization with Reasoning in Street Views Using a Large Vision-Language Model

Li, Ling; Ye, Yu; Jiang, Bingchuan; Zeng, Wei

GeoReasoner: Geo-Localization with Reasoning in Street Views Using a Large Vision-Language Model

Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

ICML 2024 pp. 29222-29233

/icml/2024/li2024icml-georeasoner/

Abstract

This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at https://github.com/lingli1996/GeoReasoner.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Li et al. "GeoReasoner: Geo-Localization with Reasoning in Street Views Using a Large Vision-Language Model." International Conference on Machine Learning, 2024.

Markdown

[Li et al. "GeoReasoner: Geo-Localization with Reasoning in Street Views Using a Large Vision-Language Model." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/li2024icml-georeasoner/)

BibTeX

@inproceedings{li2024icml-georeasoner,
  title     = {{GeoReasoner: Geo-Localization with Reasoning in Street Views Using a Large Vision-Language Model}},
  author    = {Li, Ling and Ye, Yu and Jiang, Bingchuan and Zeng, Wei},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {29222-29233},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/li2024icml-georeasoner/}
}