HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Abstract

3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis. This offers a more controllable and realistic synthesis as we can specify the structure and style inputs in a disentangled manner. HOIDiffusion is trained by leveraging a diffusion model pre-trained on large-scale natural images and a few 3D human demonstrations. Beyond controllable image synthesis we adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems. Project page: https://mq-zhang1.github.io/HOIDiffusion.

Cite

Text

Zhang et al. "HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00814

Markdown

[Zhang et al. "HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-hoidiffusion/) doi:10.1109/CVPR52733.2024.00814

BibTeX

@inproceedings{zhang2024cvpr-hoidiffusion,
  title     = {{HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data}},
  author    = {Zhang, Mengqi and Fu, Yang and Ding, Zheng and Liu, Sifei and Tu, Zhuowen and Wang, Xiaolong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {8521-8531},
  doi       = {10.1109/CVPR52733.2024.00814},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-hoidiffusion/}
}