Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips
Abstract
We tackle the task of reconstructing hand-object interactions from short video clips. Given an input video, our approach casts 3D inference as a per-video optimization and recovers a neural 3D representation of the object shape, as well as the time-varying motion and hand articulation. While the input video naturally provides some multi-view cues to guide 3D inference, these are insufficient on their own due to occlusions and limited viewpoint variations. To obtain accurate 3D, we augment the multi-view signals with generic data-driven priors to guide reconstruction. Specifically, we learn a diffusion network to model the conditional distribution of (geometric) renderings of objects conditioned on hand configuration and category label, and leverage it as a prior to guide the novel-view renderings of the reconstructed scene. We empirically evaluate our approach on egocentric videos across 6 object categories, and observe significant improvements over prior single-view and multi-view methods. Finally, we demonstrate our system's ability to reconstruct arbitrary clips from YouTube, showing both 1st and 3rd person interactions.
Cite
Text
Ye et al. "Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01806Markdown
[Ye et al. "Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/ye2023iccv-diffusionguided/) doi:10.1109/ICCV51070.2023.01806BibTeX
@inproceedings{ye2023iccv-diffusionguided,
title = {{Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips}},
author = {Ye, Yufei and Hebbar, Poorvi and Gupta, Abhinav and Tulsiani, Shubham},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {19717-19728},
doi = {10.1109/ICCV51070.2023.01806},
url = {https://mlanthology.org/iccv/2023/ye2023iccv-diffusionguided/}
}