Estimating Extreme 3D Image Rotations Using Cascaded Attention
Abstract
Estimating large extreme inter-image rotations is critical for numerous computer vision domains involving images related by limited or non-overlapping fields of view. In this work we propose an attention-based approach with a pipeline of novel algorithmic components. First as rotation estimation pertains to image pairs we introduce an inter-image distillation scheme using Decoders to improve embeddings. Second whereas contemporary methods compute a 4D correlation volume (4DCV) encoding inter-image relationships we propose an Encoder-based cross-attention approach between activation maps to compute an enhanced equivalent of the 4DCV. Finally we present a cascaded Decoder-based technique for alternately refining the cross-attention and the rotation query. Our approach outperforms current state-of-the-art methods on extreme rotation estimation. We make our code publicly available.
Cite
Text
Dekel et al. "Estimating Extreme 3D Image Rotations Using Cascaded Attention." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00250Markdown
[Dekel et al. "Estimating Extreme 3D Image Rotations Using Cascaded Attention." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/dekel2024cvpr-estimating/) doi:10.1109/CVPR52733.2024.00250BibTeX
@inproceedings{dekel2024cvpr-estimating,
title = {{Estimating Extreme 3D Image Rotations Using Cascaded Attention}},
author = {Dekel, Shay and Keller, Yosi and Cadik, Martin},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {2588-2598},
doi = {10.1109/CVPR52733.2024.00250},
url = {https://mlanthology.org/cvpr/2024/dekel2024cvpr-estimating/}
}