Learning Continuous Control Policies by Stochastic Value Gradients

Nicolas Heess, Gregory Wayne, David Silver, Timothy Lillicrap, Tom Erez, Yuval Tassa

NeurIPS 2015 pp. 2944-2952

/neurips/2015/heess2015neurips-learning/

Abstract

We present a unified framework for learning continuous control policies usingbackpropagation. It supports stochastic control by treating stochasticity in theBellman equation as a deterministic function of exogenous noise. The productis a spectrum of general policy gradient algorithms that range from model-freemethods with value functions to model-based methods without value functions.We use learned models but only require observations from the environment insteadof observations from model-predicted trajectories, minimizing the impactof compounded model errors. We apply these algorithms first to a toy stochasticcontrol problem and then to several physics-based control problems in simulation.One of these variants, SVG(1), shows the effectiveness of learning models, valuefunctions, and policies simultaneously in continuous domains.

PDF NeurIPS Semantic Scholar

Cite

Text

Heess et al. "Learning Continuous Control Policies by Stochastic Value Gradients." Neural Information Processing Systems, 2015.

Markdown

[Heess et al. "Learning Continuous Control Policies by Stochastic Value Gradients." Neural Information Processing Systems, 2015.](https://mlanthology.org/neurips/2015/heess2015neurips-learning/)

BibTeX

@inproceedings{heess2015neurips-learning,
  title     = {{Learning Continuous Control Policies by Stochastic Value Gradients}},
  author    = {Heess, Nicolas and Wayne, Gregory and Silver, David and Lillicrap, Timothy and Erez, Tom and Tassa, Yuval},
  booktitle = {Neural Information Processing Systems},
  year      = {2015},
  pages     = {2944-2952},
  url       = {https://mlanthology.org/neurips/2015/heess2015neurips-learning/}
}