Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Abstract

Reinforcement learning methods for discrete and semi-Markov de(cid:173) cision problems such as Real-Time Dynamic Programming can be generalized for Controlled Diffusion Processes. The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton(cid:173) Jacobi-Bellman (HJB-) type. Numerical analysis provides multi(cid:173) grid methods for this kind of equation. In the case of Learning Con(cid:173) trol, however, the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed to(cid:173) ward the type of time and space discretization during the obser(cid:173) vation. An algorithm for multi-grid observation is proposed. The multi-grid algorithm is demonstrated on a simple queuing problem.

Cite

Text

Pareigis. "Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes." Neural Information Processing Systems, 1996.

Markdown

[Pareigis. "Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/pareigis1996neurips-multigrid/)

BibTeX

@inproceedings{pareigis1996neurips-multigrid,
  title     = {{Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes}},
  author    = {Pareigis, Stephan},
  booktitle = {Neural Information Processing Systems},
  year      = {1996},
  pages     = {1033-1039},
  url       = {https://mlanthology.org/neurips/1996/pareigis1996neurips-multigrid/}
}