Policy Gradient in Continuous Time

Abstract

Policy search is a method for approximately solving an optimal control problem by performing a parametric optimization search in a given class of parameterized policies. In order to process a local optimization technique, such as a gradient method, we wish to evaluate the sensitivity of the performance measure with respect to the policy parameters, the so-called policy gradient. This paper is concerned with the estimation of the policy gradient for continuous-time, deterministic state dynamics, in a reinforcement learning framework, that is, when the decision maker does not have a model of the state dynamics.

Cite

Text

Munos. "Policy Gradient in Continuous Time." Journal of Machine Learning Research, 2006.

Markdown

[Munos. "Policy Gradient in Continuous Time." Journal of Machine Learning Research, 2006.](https://mlanthology.org/jmlr/2006/munos2006jmlr-policy/)

BibTeX

@article{munos2006jmlr-policy,
  title     = {{Policy Gradient in Continuous Time}},
  author    = {Munos, Rémi},
  journal   = {Journal of Machine Learning Research},
  year      = {2006},
  pages     = {771-791},
  volume    = {7},
  url       = {https://mlanthology.org/jmlr/2006/munos2006jmlr-policy/}
}