Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

Abstract

We design a new approach that allows robot learning of new activities from unlabeled human example videos. Given videos of humans executing an activity from their own viewpoint (i.e., first-person videos), our objective is to make the robot learn the temporal structure of the activity as its future regression network, and learn to transfer such model for its own motor execution. We present a new fully convolutional neural network architecture to regress the intermediate scene representation corresponding to the future frame, thereby enabling explicit forecasting of future hand locations given the current frame. The full version of the paper is available as [2].

Cite

Text

Lee and Ryoo. "Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.63

Markdown

[Lee and Ryoo. "Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/lee2017cvprw-learning/) doi:10.1109/CVPRW.2017.63

BibTeX

@inproceedings{lee2017cvprw-learning,
  title     = {{Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression}},
  author    = {Lee, Jangwon and Ryoo, Michael S.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2017},
  pages     = {472-473},
  doi       = {10.1109/CVPRW.2017.63},
  url       = {https://mlanthology.org/cvprw/2017/lee2017cvprw-learning/}
}