VimGeo: Efficient Cross-View Geo-Localization with Vision Mamba Architecture
Abstract
Cross-view geo-localization is a crucial task with diverse applications, yet it remains challenging due to the significant variations in viewpoints and visual appearances between images from different perspectives. While recent advancements have been made, existing methods often suffer from high model complexity, excessive resource consumption, and the impact of sample learning difficulty on optimization. To overcome these limitations, we optimize the Vision Mamba (Vim) model, built on a State Space Model (SSM) architecture, by replacing the traditional classification head with Channel Group Pooling (CGP) for efficient feature integration. This optimization reduces model parameters by 1.5% and computational complexity by 0.4%. Additionally, we propose a novel Dynamic Weighted Batch-tuple Loss (DWBL) to dynamically adjust the weighting of negative samples, improving model performance. By combining CGP and DWBL, we develop an efficient end-to-end network, VimGeo, which achieves state-of-the-art performance with enhanced computational efficiency. Specifically, VimGeo achieves a Recall@1 of 81.67% on the CVACT_test dataset, outperforming prior approaches. Extensive experiments on CVUSA, CVACT, and VIGOR datasets validate VimGeo's effectiveness and competitiveness in cross-view geo-localization tasks, achieving the leading results among sequence modeling-based methods. The implementation is available at: https://github.com/VimGeoTeam/VimGeo.
Cite
Text
Huang et al. "VimGeo: Efficient Cross-View Geo-Localization with Vision Mamba Architecture." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/133Markdown
[Huang et al. "VimGeo: Efficient Cross-View Geo-Localization with Vision Mamba Architecture." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/huang2025ijcai-vimgeo/) doi:10.24963/IJCAI.2025/133BibTeX
@inproceedings{huang2025ijcai-vimgeo,
title = {{VimGeo: Efficient Cross-View Geo-Localization with Vision Mamba Architecture}},
author = {Huang, Jinglin and Wu, Maoqiang and Li, Peichun and Wu, Wen and Yu, Rong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {1188-1196},
doi = {10.24963/IJCAI.2025/133},
url = {https://mlanthology.org/ijcai/2025/huang2025ijcai-vimgeo/}
}