OKAMI: Teaching Humanoid Robots Manipulation Skills Through Single Video Imitation
Abstract
We study the problem of teaching humanoid robots manipulation skills by imitating from single video demonstrations. We introduce OKAMI, a method that generates a manipulation plan from a single RGB-D video and derives a policy for execution. At the heart of our approach is object-aware retargeting, which enables the humanoid robot to mimic the human motions in an RGB-D video while adjusting to different object locations during deployment. OKAMI uses open-world vision models to identify task-relevant objects and retarget the body motions and hand poses separately. Our experiments show that OKAMI achieves strong generalizations across varying visual and spatial conditions, outperforming the state-of-the-art baseline on open-world imitation from observation. Furthermore, OKAMI rollout trajectories are leveraged to train closed-loop visuomotor policies, which achieve an average success rate of $79.2%$ without the need for labor-intensive teleoperation. More videos can be found on our website https://ut-austin-rpl.github.io/OKAMI/.
Cite
Text
Li et al. "OKAMI: Teaching Humanoid Robots Manipulation Skills Through Single Video Imitation." Proceedings of The 8th Conference on Robot Learning, 2024.Markdown
[Li et al. "OKAMI: Teaching Humanoid Robots Manipulation Skills Through Single Video Imitation." Proceedings of The 8th Conference on Robot Learning, 2024.](https://mlanthology.org/corl/2024/li2024corl-okami/)BibTeX
@inproceedings{li2024corl-okami,
title = {{OKAMI: Teaching Humanoid Robots Manipulation Skills Through Single Video Imitation}},
author = {Li, Jinhan and Zhu, Yifeng and Xie, Yuqi and Jiang, Zhenyu and Seo, Mingyo and Pavlakos, Georgios and Zhu, Yuke},
booktitle = {Proceedings of The 8th Conference on Robot Learning},
year = {2024},
pages = {299-317},
volume = {270},
url = {https://mlanthology.org/corl/2024/li2024corl-okami/}
}