DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
Abstract
While in computer vision we have made significant progress on understanding hand-object interactions, it is still very challenging for robots to perform complex dexterous manipulation. In this paper, we propose a new platform and pipeline, DexMV (Dexterous Manipulation from Videos), for imitation learning to bridge the gap between computer vision and robot learning. We design a platform with: (i) a simulation system for complex dexterous manipulation tasks with a multi-finger robot hand and (ii) a computer vision system to record large-scale demonstrations of a human hand conducting the same tasks. In the DexMV pipeline, we couple 3D hand and object pose estimation on the videos with hand motion retargeting algorithm, to extract the hand-object state trajectories. We compare multiple imitation learning and reinforcement learning (RL) algorithms on the manipulation tasks in the simulation. We show that the demonstrations can indeed improve robot learning by a large margin and solve the complex tasks which RL alone cannot solve.
Cite
Text
Qin et al. "DexMV: Imitation Learning for Dexterous Manipulation from Human Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19842-7_33Markdown
[Qin et al. "DexMV: Imitation Learning for Dexterous Manipulation from Human Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/qin2022eccv-dexmv/) doi:10.1007/978-3-031-19842-7_33BibTeX
@inproceedings{qin2022eccv-dexmv,
title = {{DexMV: Imitation Learning for Dexterous Manipulation from Human Videos}},
author = {Qin, Yuzhe and Wu, Yueh-Hua and Liu, Shaowei and Jiang, Hanwen and Yang, Ruihan and Fu, Yang and Wang, Xiaolong},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19842-7_33},
url = {https://mlanthology.org/eccv/2022/qin2022eccv-dexmv/}
}