G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis

Abstract

We propose G-HOP a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as a generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model trained by aggregating several diverse real-world interaction datasets spanning 155 categories represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis outperforming current task-specific baselines.

Cite

Text

Ye et al. "G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00187

Markdown

[Ye et al. "G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/ye2024cvpr-ghop/) doi:10.1109/CVPR52733.2024.00187

BibTeX

@inproceedings{ye2024cvpr-ghop,
  title     = {{G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis}},
  author    = {Ye, Yufei and Gupta, Abhinav and Kitani, Kris and Tulsiani, Shubham},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {1911-1920},
  doi       = {10.1109/CVPR52733.2024.00187},
  url       = {https://mlanthology.org/cvpr/2024/ye2024cvpr-ghop/}
}