Reinforcement Learning for Continuous Stochastic Control Problems

Abstract

This paper is concerned with the problem of Reinforcement Learn(cid:173) ing (RL) for continuous state space and time stocha.stic control problems. We state the Harnilton-Jacobi-Bellman equation satis(cid:173) fied by the value function and use a Finite-Difference method for designing a convergent approximation scheme. Then we propose a RL algorithm based on this scheme and prove its convergence to the optimal solution.

Cite

Text

Munos and Bourgine. "Reinforcement Learning for Continuous Stochastic Control Problems." Neural Information Processing Systems, 1997.

Markdown

[Munos and Bourgine. "Reinforcement Learning for Continuous Stochastic Control Problems." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/munos1997neurips-reinforcement/)

BibTeX

@inproceedings{munos1997neurips-reinforcement,
  title     = {{Reinforcement Learning for Continuous Stochastic Control Problems}},
  author    = {Munos, Rémi and Bourgine, Paul},
  booktitle = {Neural Information Processing Systems},
  year      = {1997},
  pages     = {1029-1035},
  url       = {https://mlanthology.org/neurips/1997/munos1997neurips-reinforcement/}
}