Revisiting the Receptive Field of Conv-GRU in DROID-SLAM
Abstract
This work focuses on improving the Conv-GRU-based optical flow update within a DROID-SLAM framework. Prior optical flow models typically follow a UNet or coarse-to-fine architecture in order to extract long-range cross-correlation and context cues. This helps flow estimation in the presence of large motion and challenging image regions, e.g., textureless regions. We propose modifications to the Conv-GRU module which follows the rationale of these prior models by integrating (Atrous) Spatial Pyramid Pooling and global self-attention into the Conv-GRU block. By enlarging the receptive field through the aforementioned modifications, the model is able to integrate information from a larger context window, thus improving the robustness even when given inputs that comprise challenging image regions. We show empirically through extensive experiments the gain in accuracy through these modifications.
Cite
Text
Bangunharcana et al. "Revisiting the Receptive Field of Conv-GRU in DROID-SLAM." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00207Markdown
[Bangunharcana et al. "Revisiting the Receptive Field of Conv-GRU in DROID-SLAM." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/bangunharcana2022cvprw-revisiting/) doi:10.1109/CVPRW56347.2022.00207BibTeX
@inproceedings{bangunharcana2022cvprw-revisiting,
title = {{Revisiting the Receptive Field of Conv-GRU in DROID-SLAM}},
author = {Bangunharcana, Antyanta and Kim, Soohyun and Kim, Kyung-Soo},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2022},
pages = {1905-1915},
doi = {10.1109/CVPRW56347.2022.00207},
url = {https://mlanthology.org/cvprw/2022/bangunharcana2022cvprw-revisiting/}
}