Geometrized Transformer for Self-Supervised Homography Estimation

Abstract

For homography estimation, we propose Geometrized Transformer (GeoFormer), a new detector-free feature matching method. Current detector-free methods, e.g. LoFTR, lack an effective mean to accurately localize small and thus computationally feasible regions for cross-attention diffusion. We resolve the challenge with an extremely simple idea: using the classical RANSAC geometry for attentive region search. Given coarse matches by LoFTR, a homography is obtained with ease. Such a homography allows us to compute cross-attention in a focused manner, where key/value sets required by Transformers can be reduced to small fix-sized regions rather than an entire image. Local features can thus be enhanced by standard Transformers. We integrate GeoFormer into the LoFTR framework. By minimizing a multi-scale cross-entropy based matching loss on auto-generated training data, the network is trained in a fully self-supervised manner. Extensive experiments are conducted on multiple real-world datasets covering natural images, heavily manipulated pictures and retinal images. The proposed method compares favorably against the state-of-the-art.

Cite

Text

Liu and Li. "Geometrized Transformer for Self-Supervised Homography Estimation." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00876

Markdown

[Liu and Li. "Geometrized Transformer for Self-Supervised Homography Estimation." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/liu2023iccv-geometrized/) doi:10.1109/ICCV51070.2023.00876

BibTeX

@inproceedings{liu2023iccv-geometrized,
  title     = {{Geometrized Transformer for Self-Supervised Homography Estimation}},
  author    = {Liu, Jiazhen and Li, Xirong},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {9556-9565},
  doi       = {10.1109/ICCV51070.2023.00876},
  url       = {https://mlanthology.org/iccv/2023/liu2023iccv-geometrized/}
}