Faster Policy Learning with Continuous-Time Gradients

Abstract

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

Cite

Text

Ainsworth et al. "Faster Policy Learning with Continuous-Time Gradients." Proceedings of the 3rd Conference on Learning for Dynamics and Control, 2021.

Markdown

[Ainsworth et al. "Faster Policy Learning with Continuous-Time Gradients." Proceedings of the 3rd Conference on Learning for Dynamics and Control, 2021.](https://mlanthology.org/l4dc/2021/ainsworth2021l4dc-faster/)

BibTeX

@inproceedings{ainsworth2021l4dc-faster,
  title     = {{Faster Policy Learning with Continuous-Time Gradients}},
  author    = {Ainsworth, Samuel and Lowrey, Kendall and Thickstun, John and Harchaoui, Zaid and Srinivasa, Siddhartha},
  booktitle = {Proceedings of the 3rd Conference on Learning for Dynamics and Control},
  year      = {2021},
  pages     = {1054-1067},
  volume    = {144},
  url       = {https://mlanthology.org/l4dc/2021/ainsworth2021l4dc-faster/}
}