R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating
Abstract
In this paper, we propose Recurrent Multi-Scale Feature Modulation (R-MSFM), a new deep network architecture for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multi-scale feature modulation module, and iteratively updates an inverse depth through a parameter-shared decoder at the fixed resolution. This architecture enables our R-MSFM to maintain semantically richer while spatially more precise representations and avoid the error propagation caused by the traditional U-Net-like coarse-to-fine architecture widely used in this domain, resulting in strong generalization and efficient parameter count. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show the state-of-the-art results on the KITTI benchmark. Code is available at https://github.com/jsczzzk/R-MSFM
Cite
Text
Zhou et al. "R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01254Markdown
[Zhou et al. "R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/zhou2021iccv-rmsfm/) doi:10.1109/ICCV48922.2021.01254BibTeX
@inproceedings{zhou2021iccv-rmsfm,
title = {{R-MSFM: Recurrent Multi-Scale Feature Modulation for Monocular Depth Estimating}},
author = {Zhou, Zhongkai and Fan, Xinnan and Shi, Pengfei and Xin, Yuanxue},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {12777-12786},
doi = {10.1109/ICCV48922.2021.01254},
url = {https://mlanthology.org/iccv/2021/zhou2021iccv-rmsfm/}
}