Model-Free Trajectory Optimization for Reinforcement Learning

Abstract

Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

Cite

Text

Akrour et al. "Model-Free Trajectory Optimization for Reinforcement Learning." International Conference on Machine Learning, 2016.

Markdown

[Akrour et al. "Model-Free Trajectory Optimization for Reinforcement Learning." International Conference on Machine Learning, 2016.](https://mlanthology.org/icml/2016/akrour2016icml-modelfree/)

BibTeX

@inproceedings{akrour2016icml-modelfree,
  title     = {{Model-Free Trajectory Optimization for Reinforcement Learning}},
  author    = {Akrour, Riad and Neumann, Gerhard and Abdulsamad, Hany and Abdolmaleki, Abbas},
  booktitle = {International Conference on Machine Learning},
  year      = {2016},
  pages     = {2961-2970},
  volume    = {48},
  url       = {https://mlanthology.org/icml/2016/akrour2016icml-modelfree/}
}