Value Propagation Networks
Abstract
We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive navigation problems. We evaluate on static and dynamic configurations of MazeBase grid-worlds, with randomly generated environments of several different sizes, and on a StarCraft navigation scenario, with more complex dynamics, and pixels as input.
Cite
Text
Nardelli et al. "Value Propagation Networks." International Conference on Learning Representations, 2019.Markdown
[Nardelli et al. "Value Propagation Networks." International Conference on Learning Representations, 2019.](https://mlanthology.org/iclr/2019/nardelli2019iclr-value/)BibTeX
@inproceedings{nardelli2019iclr-value,
title = {{Value Propagation Networks}},
author = {Nardelli, Nantas and Synnaeve, Gabriel and Lin, Zeming and Kohli, Pushmeet and Torr, Philip H. S. and Usunier, Nicolas},
booktitle = {International Conference on Learning Representations},
year = {2019},
url = {https://mlanthology.org/iclr/2019/nardelli2019iclr-value/}
}