Policy Gradient in Continuous Time
Abstract
Policy search is a method for approximately solving an optimal control problem by performing a parametric optimization search in a given class of parameterized policies. In order to process a local optimization technique, such as a gradient method, we wish to evaluate the sensitivity of the performance measure with respect to the policy parameters, the so-called policy gradient. This paper is concerned with the estimation of the policy gradient for continuous-time, deterministic state dynamics, in a reinforcement learning framework, that is, when the decision maker does not have a model of the state dynamics.
Cite
Text
Munos. "Policy Gradient in Continuous Time." Journal of Machine Learning Research, 2006.Markdown
[Munos. "Policy Gradient in Continuous Time." Journal of Machine Learning Research, 2006.](https://mlanthology.org/jmlr/2006/munos2006jmlr-policy/)BibTeX
@article{munos2006jmlr-policy,
title = {{Policy Gradient in Continuous Time}},
author = {Munos, Rémi},
journal = {Journal of Machine Learning Research},
year = {2006},
pages = {771-791},
volume = {7},
url = {https://mlanthology.org/jmlr/2006/munos2006jmlr-policy/}
}