OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model
Abstract
Understanding and synthesizing realistic 3D hand-object interactions (HOI) is critical for applications ranging from immersive AR/VR to dexterous robotics. Existing methods struggle with generalization, performing well on closed-set objects and predefined tasks but failing to handle unseen objects or open-vocabulary instructions. We introduce OpenHOI, the first framework for open-world HOI synthesis, capable of generating long-horizon manipulation sequences for novel objects guided by free-form language commands. Our approach integrates a 3D Multimodal Large Language Model (MLLM) fine-tuned for joint affordance grounding and semantic task decomposition, enabling precise localization of interaction regions (e.g., handles, buttons) and breakdown of complex instructions (e.g., “Find a water bottle and take a sip”) into executable sub-tasks. To synthesize physically plausible interactions, we propose an affordance-driven diffusion model paired with a training-free physics refinement stage that minimizes penetration and optimizes affordance alignment. Evaluations across diverse scenarios demonstrate OpenHOI’s superiority over state-of-the-art methods in generalizing to novel object categories, multi-stage tasks, and complex language instructions.
Cite
Text
Zhang et al. "OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model." Advances in Neural Information Processing Systems, 2025.Markdown
[Zhang et al. "OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-openhoi/)BibTeX
@inproceedings{zhang2025neurips-openhoi,
title = {{OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model}},
author = {Zhang, Zhenhao and Shi, Ye and Yang, Lingxiao and Ni, Suting and Ye, Qi and Wang, Jingya},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/zhang2025neurips-openhoi/}
}