MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision
Abstract
Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models which are hard to collect in real world. In contrast readily accessible hand-object videos offer a promising training data source but they only give heavily occluded object observations. In this paper we present a novel synthetic-to-real framework to exploit Multi-view Occlusion-aware supervision from hand-object videos for Hand-held Object reconstruction (MOHO) from a single image tackling two predominant challenges in such setting: hand-induced occlusion and object's self-occlusion. First in the synthetic pre-training stage we render a large-scaled synthetic dataset SOMVideo with hand-object images and multi-view occlusion-free supervisions adopted to address hand-induced occlusion in both 2D and 3D spaces. Second in the real-world finetuning stage MOHO leverages the amodal-mask-weighted geometric supervision to mitigate the unfaithful guidance caused by the hand-occluded supervising views in real world. Moreover domain-consistent occlusion-aware features are amalgamated in MOHO to resist object's self-occlusion for inferring the complete object shape. Extensive experiments on HO3D and DexYCB datasets demonstrate 2D-supervised MOHO gains superior results against 3D-supervised methods by a large margin.
Cite
Text
Zhang et al. "MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00953Markdown
[Zhang et al. "MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-moho/) doi:10.1109/CVPR52733.2024.00953BibTeX
@inproceedings{zhang2024cvpr-moho,
title = {{MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision}},
author = {Zhang, Chenyangguang and Jiao, Guanlong and Di, Yan and Wang, Gu and Huang, Ziqin and Zhang, Ruida and Manhardt, Fabian and Fu, Bowen and Tombari, Federico and Ji, Xiangyang},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {9992-10002},
doi = {10.1109/CVPR52733.2024.00953},
url = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-moho/}
}