Learning to Match Aerial Images with Deep Attentive Architectures

Abstract

Image matching is a fundamental problem in Computer Vision. In the context of feature-based matching, SIFT and its variants have long excelled in a wide array of applications. However, for ultra-wide baselines, as in the case of aerial images captured under large camera rotations, the appearance variation goes beyond the reach of SIFT and RANSAC. In this paper we propose a data-driven, deep learning-based approach that sidesteps local correspondence by framing the problem as a classification task. Furthermore, we demonstrate that local correspondences can still be useful. To do so we incorporate an attention mechanism to produce a set of probable matches, which allows us to further increase performance. We train our models on a dataset of urban aerial imagery consisting of `same' and `different' pairs, collected for this purpose, and characterize the problem via a human study with annotations from Amazon Mechanical Turk. We demonstrate that our models outperform the state-of-the-art on ultra-wide baseline matching and approach human accuracy.

Cite

Text

Altwaijry et al. "Learning to Match Aerial Images with Deep Attentive Architectures." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.385

Markdown

[Altwaijry et al. "Learning to Match Aerial Images with Deep Attentive Architectures." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/altwaijry2016cvpr-learning/) doi:10.1109/CVPR.2016.385

BibTeX

@inproceedings{altwaijry2016cvpr-learning,
  title     = {{Learning to Match Aerial Images with Deep Attentive Architectures}},
  author    = {Altwaijry, Hani and Trulls, Eduard and Hays, James and Fua, Pascal and Belongie, Serge},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.385},
  url       = {https://mlanthology.org/cvpr/2016/altwaijry2016cvpr-learning/}
}