D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers

Abstract

Establishing pixel-level matches between image pairs is vital for a variety of computer vision applications. However, achieving robust image matching remains challenging because CNN extracted descriptors usually lack discriminative ability in texture-less regions and keypoint detectors are only good at identifying keypoints with a specific level of structure. To deal with these issues, a novel image matching method is proposed by Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-based Transformers (D2Former), including a contextual feature descriptor learning (CFDL) module and a hierarchical keypoint detector learning (HKDL) module. The proposed D2Former enjoys several merits. First, the proposed CFDL module can model long-range contexts efficiently and effectively with the aid of designed descriptor agents. Second, the HKDL module can generate keypoint detectors in a hierarchical way, which is helpful for detecting keypoints with diverse levels of structures. Extensive experimental results on four challenging benchmarks show that our proposed method significantly outperforms state-of-the-art image matching methods.

Cite

Text

He et al. "D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00284

Markdown

[He et al. "D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/he2023cvpr-d2former/) doi:10.1109/CVPR52729.2023.00284

BibTeX

@inproceedings{he2023cvpr-d2former,
  title     = {{D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers}},
  author    = {He, Jianfeng and Gao, Yuan and Zhang, Tianzhu and Zhang, Zhe and Wu, Feng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {2904-2914},
  doi       = {10.1109/CVPR52729.2023.00284},
  url       = {https://mlanthology.org/cvpr/2023/he2023cvpr-d2former/}
}