Hyper-Depth: Hypergraph-Based Multi-Scale Representation Fusion for Monocular Depth Estimation
Abstract
Monocular depth estimation (MDE) is a fundamental problem in computer vision with wide-ranging applications in various downstream tasks. While multi-scale features are perceptually critical for MDE, existing transformer-based methods have yet to leverage them explicitly. To address this limitation, we propose a hypergraph-based multi-scale representation fusion framework, Hyper-Depth.The proposed Hyper-Depth incorporates two key components: a Semantic Consistency Enhancement (SCE) module and a Geometric Consistency Constraint (GCC) module. The SCE module, designed based on hypergraph convolution, aggregates global information and enhances the representation of multi-scale patch features. Meanwhile, the GCC module provides geometric guidance to reduce over-fitting errors caused by excessive reliance on local features. In addition, we introduce a Correlation-based Conditional Random Fields (C-CRFs) module as the decoder to filter correlated patches and compute attention weights more effectively.Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across all evaluation metrics on the KITTI and NYU-Depth-v2 datasets, achieving improvements of 6.21% and 3.32% on the main metric RMSE, respectively. Furthermore, zero-shot evaluations on the nuScenes and SUN-RGBD datasets validate the generalizability of our approach.
Cite
Text
Bie et al. "Hyper-Depth: Hypergraph-Based Multi-Scale Representation Fusion for Monocular Depth Estimation." International Conference on Computer Vision, 2025.Markdown
[Bie et al. "Hyper-Depth: Hypergraph-Based Multi-Scale Representation Fusion for Monocular Depth Estimation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/bie2025iccv-hyperdepth/)BibTeX
@inproceedings{bie2025iccv-hyperdepth,
title = {{Hyper-Depth: Hypergraph-Based Multi-Scale Representation Fusion for Monocular Depth Estimation}},
author = {Bie, Lin and Li, Siqi and Feng, Yifan and Gao, Yue},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {5081-5090},
url = {https://mlanthology.org/iccv/2025/bie2025iccv-hyperdepth/}
}