Cross-View Completion Models Are Zero-Shot Correspondence Estimators

Abstract

In this work, we analyze new aspects of cross-view completion, mainly through the analogy of cross-view completion and traditional self-supervised correspondence learning algorithms. Based on our analysis, we reveal that the cross-attention map of Croco-v2, best reflects this correspondence information compared to other correlations from the encoder or decoder features. We further verify the effectiveness of the cross-attention map by evaluating on both zero-shot and supervised dense geometric correspondence and multi-frame depth estimation.

Cite

Text

An et al. "Cross-View Completion Models Are Zero-Shot Correspondence Estimators." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00111

Markdown

[An et al. "Cross-View Completion Models Are Zero-Shot Correspondence Estimators." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/an2025cvpr-crossview/) doi:10.1109/CVPR52734.2025.00111

BibTeX

@inproceedings{an2025cvpr-crossview,
  title     = {{Cross-View Completion Models Are Zero-Shot Correspondence Estimators}},
  author    = {An, Honggyu and Kim, Jin Hyeon and Park, Seonghoon and Jung, Jaewoo and Han, Jisang and Hong, Sunghwan and Kim, Seungryong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {1103-1115},
  doi       = {10.1109/CVPR52734.2025.00111},
  url       = {https://mlanthology.org/cvpr/2025/an2025cvpr-crossview/}
}