InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions

Abstract

Achieving realistic simulations of humans interacting with a wide range of objects has long been a fundamental goal. Extending physics-based motion imitation to complex human-object interactions (HOIs) is challenging due to intricate human-object coupling, variability in object geometries, and artifacts in motion capture data, such as inaccurate contacts and limited hand detail. We introduce InterMimic, a framework that enables a single policy to robustly learn from hours of imperfect MoCap data covering diverse full-body interactions with dynamic and varied objects. Our key insight is to employ a curriculum strategy -- perfect first, then scale up. We first train subject-specific teacher policies to mimic, retarget, and refine motion capture data. Next, we distill these teachers into a student policy, with the teachers acting as online experts providing direct supervision, as well as high-quality references. Notably, we incorporate RL fine-tuning on the student policy to surpass mere demonstration replication and achieve higher-quality solutions. Our experiments demonstrate that InterMimic produces realistic and diverse interactions across multiple HOI datasets. The learned policy generalizes in a zero-shot manner and seamlessly integrates with kinematic generators, elevating the framework from mere imitation to generative modeling of complex human-object interactions.

Cite

Text

Xu et al. "InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01145

Markdown

[Xu et al. "InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/xu2025cvpr-intermimic/) doi:10.1109/CVPR52734.2025.01145

BibTeX

@inproceedings{xu2025cvpr-intermimic,
  title     = {{InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions}},
  author    = {Xu, Sirui and Ling, Hung Yu and Wang, Yu-Xiong and Gui, Liang-Yan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {12266-12277},
  doi       = {10.1109/CVPR52734.2025.01145},
  url       = {https://mlanthology.org/cvpr/2025/xu2025cvpr-intermimic/}
}