Value Prediction Network

Abstract

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.

Cite

Text

Oh et al. "Value Prediction Network." Neural Information Processing Systems, 2017.

Markdown

[Oh et al. "Value Prediction Network." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/oh2017neurips-value/)

BibTeX

@inproceedings{oh2017neurips-value,
  title     = {{Value Prediction Network}},
  author    = {Oh, Junhyuk and Singh, Satinder and Lee, Honglak},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {6118-6128},
  url       = {https://mlanthology.org/neurips/2017/oh2017neurips-value/}
}