Cross-View Completion Models Are Zero-Shot Correspondence Estimators
Abstract
In this work, we analyze new aspects of cross-view completion, mainly through the analogy of cross-view completion and traditional self-supervised correspondence learning algorithms. Based on our analysis, we reveal that the cross-attention map of Croco-v2, best reflects this correspondence information compared to other correlations from the encoder or decoder features. We further verify the effectiveness of the cross-attention map by evaluating on both zero-shot and supervised dense geometric correspondence and multi-frame depth estimation.
Cite
Text
An et al. "Cross-View Completion Models Are Zero-Shot Correspondence Estimators." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00111Markdown
[An et al. "Cross-View Completion Models Are Zero-Shot Correspondence Estimators." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/an2025cvpr-crossview/) doi:10.1109/CVPR52734.2025.00111BibTeX
@inproceedings{an2025cvpr-crossview,
title = {{Cross-View Completion Models Are Zero-Shot Correspondence Estimators}},
author = {An, Honggyu and Kim, Jin Hyeon and Park, Seonghoon and Jung, Jaewoo and Han, Jisang and Hong, Sunghwan and Kim, Seungryong},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {1103-1115},
doi = {10.1109/CVPR52734.2025.00111},
url = {https://mlanthology.org/cvpr/2025/an2025cvpr-crossview/}
}