MonSter: Marry Monodepth to Stereo Unleashes Power

Abstract

Stereo matching recovers depth from image correspondences. Existing methods struggle to handle ill-posed regions with limited matching cues, such as occlusions and textureless areas. To address this, we propose MonSter, a novel method that leverages the complementary strengths of monocular depth estimation and stereo matching. MonSter integrates monocular depth and stereo matching into a dual-branch architecture to iteratively improve each other. Confidence-based guidance adaptively selects reliable stereo cues for monodepth scale-shift recovery, and utilizes explicit monocular depth priors to enhance stereo matching at ill-posed regions. Such iterative mutual enhancement enables MonSter to evolve monodepth priors from coarse object-level structures to pixel-level geometry, fully unlocking the potential of stereo matching. As shown in Fig.2, MonSter ranks 1st across five most commonly used leaderboards --- SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Achieving up to 49.5% improvements over the previous best method (Bad 1.0 on ETH3D). Comprehensive analysis verifies the effectiveness of MonSter in ill-posed regions. In terms of zero-shot generalization, MonSter significantly and consistently outperforms state-of-the-art methods across the board. Code will be released upon acceptance.

Cite

Text

Cheng et al. "MonSter: Marry Monodepth to Stereo Unleashes Power." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00588

Markdown

[Cheng et al. "MonSter: Marry Monodepth to Stereo Unleashes Power." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/cheng2025cvpr-monster/) doi:10.1109/CVPR52734.2025.00588

BibTeX

@inproceedings{cheng2025cvpr-monster,
  title     = {{MonSter: Marry Monodepth to Stereo Unleashes Power}},
  author    = {Cheng, Junda and Liu, Longliang and Xu, Gangwei and Wang, Xianqi and Zhang, Zhaoxing and Deng, Yong and Zang, Jinliang and Chen, Yurui and Cai, Zhipeng and Yang, Xin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {6273-6282},
  doi       = {10.1109/CVPR52734.2025.00588},
  url       = {https://mlanthology.org/cvpr/2025/cheng2025cvpr-monster/}
}