D2-Net: A Trainable CNN for Joint Description and Detection of Local Features

Abstract

In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.

Cite

Text

Dusmanu et al. "D2-Net: A Trainable CNN for Joint Description and Detection of Local Features." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00828

Markdown

[Dusmanu et al. "D2-Net: A Trainable CNN for Joint Description and Detection of Local Features." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/dusmanu2019cvpr-d2net/) doi:10.1109/CVPR.2019.00828

BibTeX

@inproceedings{dusmanu2019cvpr-d2net,
  title     = {{D2-Net: A Trainable CNN for Joint Description and Detection of Local Features}},
  author    = {Dusmanu, Mihai and Rocco, Ignacio and Pajdla, Tomas and Pollefeys, Marc and Sivic, Josef and Torii, Akihiko and Sattler, Torsten},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00828},
  url       = {https://mlanthology.org/cvpr/2019/dusmanu2019cvpr-d2net/}
}