Zero-Shot Visual Imitation

Abstract

Imitating expert demonstration is a powerful mechanism for learning to perform tasks from raw sensory observations. The current dominant paradigm in learning from demonstration (LfD) [3,16,19,20] requires the expert to either manually move the robot joints (i.e., kinesthetic teaching) or teleoperate the robot to execute the desired task. The expert typically provides multiple demonstrations of a task at training time, and this generates data in the form of observation-action pairs from the agent's point of view. The agent then distills this data into a policy for performing the task of interest. Such a heavily supervised approach, where it is necessary to provide demonstrations by controlling the robot, is incredibly tedious for the human expert. Moreover, for every new task that the robot needs to execute, the expert is required to provide a new set of demonstrations.

Cite

Text

Pathak et al. "Zero-Shot Visual Imitation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018. doi:10.1109/CVPRW.2018.00278

Markdown

[Pathak et al. "Zero-Shot Visual Imitation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/pathak2018cvprw-zeroshot/) doi:10.1109/CVPRW.2018.00278

BibTeX

@inproceedings{pathak2018cvprw-zeroshot,
  title     = {{Zero-Shot Visual Imitation}},
  author    = {Pathak, Deepak and Mahmoudieh, Parsa and Luo, Guanghao and Agrawal, Pulkit and Chen, Dian and Shentu, Yide and Shelhamer, Evan and Malik, Jitendra and Efros, Alexei A. and Darrell, Trevor},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2018},
  pages     = {2050-2053},
  doi       = {10.1109/CVPRW.2018.00278},
  url       = {https://mlanthology.org/cvprw/2018/pathak2018cvprw-zeroshot/}
}