OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model

Abstract

Understanding and synthesizing realistic 3D hand-object interactions (HOI) is critical for applications ranging from immersive AR/VR to dexterous robotics. Existing methods struggle with generalization, performing well on closed-set objects and predefined tasks but failing to handle unseen objects or open-vocabulary instructions. We introduce OpenHOI, the first framework for open-world HOI synthesis, capable of generating long-horizon manipulation sequences for novel objects guided by free-form language commands. Our approach integrates a 3D Multimodal Large Language Model (MLLM) fine-tuned for joint affordance grounding and semantic task decomposition, enabling precise localization of interaction regions (e.g., handles, buttons) and breakdown of complex instructions (e.g., “Find a water bottle and take a sip”) into executable sub-tasks. To synthesize physically plausible interactions, we propose an affordance-driven diffusion model paired with a training-free physics refinement stage that minimizes penetration and optimizes affordance alignment. Evaluations across diverse scenarios demonstrate OpenHOI’s superiority over state-of-the-art methods in generalizing to novel object categories, multi-stage tasks, and complex language instructions.

Cite

Text

Zhang et al. "OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-openhoi/)

BibTeX

@inproceedings{zhang2025neurips-openhoi,
  title     = {{OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model}},
  author    = {Zhang, Zhenhao and Shi, Ye and Yang, Lingxiao and Ni, Suting and Ye, Qi and Wang, Jingya},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-openhoi/}
}