GAMa: Cross-View Video Geo-Localization

Abstract

The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images. We also propose a novel approach to solve this problem. At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video. Moreover, we propose a hierarchical approach to further improve the clip-level geo-localization. On this challenging dataset, with unaligned images and limited field of view, our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile. Code & dataset are available at this link.

Cite

Text

Vyas et al. "GAMa: Cross-View Video Geo-Localization." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19836-6

Markdown

[Vyas et al. "GAMa: Cross-View Video Geo-Localization." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/vyas2022eccv-gama/) doi:10.1007/978-3-031-19836-6

BibTeX

@inproceedings{vyas2022eccv-gama,
  title     = {{GAMa: Cross-View Video Geo-Localization}},
  author    = {Vyas, Shruti and Chen, Chen and Shah, Mubarak},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19836-6},
  url       = {https://mlanthology.org/eccv/2022/vyas2022eccv-gama/}
}