ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
Abstract
Creating a photorealistic scene and human reconstruction from a single monocular in-the-wild video figures prominently in the perception of a human-centric 3D world. Recent neural rendering advances have enabled holistic human-scene reconstruction but require pre-calibrated camera and human poses, and days of training time. In this work, we introduce a novel unified framework that simultaneously performs camera tracking, human pose estimation and human-scene reconstruction in an online fashion. 3D Gaussian Splatting is utilized to learn Gaussian primitives for humans and scenes efficiently, and reconstruction-based camera tracking and human pose estimation modules are designed to enable holistic understanding and effective disentanglement of pose and appearance. Specifically, we design a human deformation module to reconstruct the details and enhance generalizability to out-of-distribution poses faithfully. Aiming to learn the spatial correlation between human and scene accurately, we introduce occlusion-aware human silhouette rendering and monocular geometric priors, which further improve reconstruction quality. Experiments on the EMDB and NeuMan datasets demonstrate superior or on-par performance with existing methods in camera tracking, human pose estimation, novel view synthesis and runtime. Our project page is at https://eth-ait.github.io/ODHSR.
Cite
Text
Zhang et al. "ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02033Markdown
[Zhang et al. "ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/zhang2025cvpr-odhsr/) doi:10.1109/CVPR52734.2025.02033BibTeX
@inproceedings{zhang2025cvpr-odhsr,
title = {{ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos}},
author = {Zhang, Zetong and Kaufmann, Manuel and Xue, Lixin and Song, Jie and Oswald, Martin R.},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {21824-21835},
doi = {10.1109/CVPR52734.2025.02033},
url = {https://mlanthology.org/cvpr/2025/zhang2025cvpr-odhsr/}
}