Learning Complex Neural Network Policies with Trajectory Optimization

Abstract

Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.

Cite

Text

Levine and Koltun. "Learning Complex Neural Network Policies with Trajectory Optimization." International Conference on Machine Learning, 2014.

Markdown

[Levine and Koltun. "Learning Complex Neural Network Policies with Trajectory Optimization." International Conference on Machine Learning, 2014.](https://mlanthology.org/icml/2014/levine2014icml-learning/)

BibTeX

@inproceedings{levine2014icml-learning,
  title     = {{Learning Complex Neural Network Policies with Trajectory Optimization}},
  author    = {Levine, Sergey and Koltun, Vladlen},
  booktitle = {International Conference on Machine Learning},
  year      = {2014},
  pages     = {829-837},
  volume    = {32},
  url       = {https://mlanthology.org/icml/2014/levine2014icml-learning/}
}