Continuous Control with Deep Reinforcement Learning

Abstract

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Cite

Text

Lillicrap et al. "Continuous Control with Deep Reinforcement Learning." International Conference on Learning Representations, 2016.

Markdown

[Lillicrap et al. "Continuous Control with Deep Reinforcement Learning." International Conference on Learning Representations, 2016.](https://mlanthology.org/iclr/2016/lillicrap2016iclr-continuous/)

BibTeX

@inproceedings{lillicrap2016iclr-continuous,
  title     = {{Continuous Control with Deep Reinforcement Learning}},
  author    = {Lillicrap, Timothy P. and Hunt, Jonathan J. and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},
  booktitle = {International Conference on Learning Representations},
  year      = {2016},
  url       = {https://mlanthology.org/iclr/2016/lillicrap2016iclr-continuous/}
}