Reinforcement Learning for Continuous Stochastic Control Problems
Abstract
This paper is concerned with the problem of Reinforcement Learn(cid:173) ing (RL) for continuous state space and time stocha.stic control problems. We state the Harnilton-Jacobi-Bellman equation satis(cid:173) fied by the value function and use a Finite-Difference method for designing a convergent approximation scheme. Then we propose a RL algorithm based on this scheme and prove its convergence to the optimal solution.
Cite
Text
Munos and Bourgine. "Reinforcement Learning for Continuous Stochastic Control Problems." Neural Information Processing Systems, 1997.Markdown
[Munos and Bourgine. "Reinforcement Learning for Continuous Stochastic Control Problems." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/munos1997neurips-reinforcement/)BibTeX
@inproceedings{munos1997neurips-reinforcement,
title = {{Reinforcement Learning for Continuous Stochastic Control Problems}},
author = {Munos, Rémi and Bourgine, Paul},
booktitle = {Neural Information Processing Systems},
year = {1997},
pages = {1029-1035},
url = {https://mlanthology.org/neurips/1997/munos1997neurips-reinforcement/}
}