Visual Semantic Planning Using Deep Successor Representations

Abstract

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects and their affordances, as well as actions and their preconditions and effects. We propose learning these through interacting with a visual and dynamic environment. Our proposed solution involves bootstrapping reinforcement learning with imitation learning. To ensure cross task generalization, we develop a deep predictive model based on successor representations. Our experimental results show near optimal results across a wide range of tasks in the challenging THOR environment. The supplementary video can be accessed at the following link: https://goo.gl/vXsbQP.

Cite

Text

Zhu et al. "Visual Semantic Planning Using Deep Successor Representations." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.60

Markdown

[Zhu et al. "Visual Semantic Planning Using Deep Successor Representations." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/zhu2017iccv-visual/) doi:10.1109/ICCV.2017.60

BibTeX

@inproceedings{zhu2017iccv-visual,
  title     = {{Visual Semantic Planning Using Deep Successor Representations}},
  author    = {Zhu, Yuke and Gordon, Daniel and Kolve, Eric and Fox, Dieter and Fei-Fei, Li and Gupta, Abhinav and Mottaghi, Roozbeh and Farhadi, Ali},
  booktitle = {International Conference on Computer Vision},
  year      = {2017},
  doi       = {10.1109/ICCV.2017.60},
  url       = {https://mlanthology.org/iccv/2017/zhu2017iccv-visual/}
}