Self-Supervising Fine-Grained Region Similarities for Large-Scale Image Localization

Abstract

The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset. However, the general public benchmarks only provide noisy GPS labels associated with the training images, which act as weak supervisions for learning image-to-image similarities. Such label noise prevents deep neural networks from learning discriminative features for accurate localization. To tackle this challenge, we propose to self-supervise image-to-region similarities in order to fully explore the potential of difficult positive images alongside their sub-regions. The estimated image-to-region similarities can serve as extra training supervision for improving the network in generations, which could in turn gradually refine the fine-grained similarities to achieve optimal performance. Our proposed self-enhanced image-to-region similarity labels effectively deal with the training bottleneck in the state-of-the-art pipelines without any additional parameters or manual annotations in both training and inference. Our method outperforms state-of-the-arts on the standard localization benchmarks by noticeable margins and shows excellent generalization capability on multiple image retrieval datasets.

Cite

Text

Ge et al. "Self-Supervising Fine-Grained Region Similarities for Large-Scale Image Localization." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58548-8_22

Markdown

[Ge et al. "Self-Supervising Fine-Grained Region Similarities for Large-Scale Image Localization." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/ge2020eccv-selfsupervising/) doi:10.1007/978-3-030-58548-8_22

BibTeX

@inproceedings{ge2020eccv-selfsupervising,
  title     = {{Self-Supervising Fine-Grained Region Similarities for Large-Scale Image Localization}},
  author    = {Ge, Yixiao and Wang, Haibo and Zhu, Feng and Zhao, Rui and Li, Hongsheng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58548-8_22},
  url       = {https://mlanthology.org/eccv/2020/ge2020eccv-selfsupervising/}
}