Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes
Abstract
Reinforcement learning methods for discrete and semi-Markov de(cid:173) cision problems such as Real-Time Dynamic Programming can be generalized for Controlled Diffusion Processes. The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton(cid:173) Jacobi-Bellman (HJB-) type. Numerical analysis provides multi(cid:173) grid methods for this kind of equation. In the case of Learning Con(cid:173) trol, however, the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed to(cid:173) ward the type of time and space discretization during the obser(cid:173) vation. An algorithm for multi-grid observation is proposed. The multi-grid algorithm is demonstrated on a simple queuing problem.
Cite
Text
Pareigis. "Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes." Neural Information Processing Systems, 1996.Markdown
[Pareigis. "Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/pareigis1996neurips-multigrid/)BibTeX
@inproceedings{pareigis1996neurips-multigrid,
title = {{Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes}},
author = {Pareigis, Stephan},
booktitle = {Neural Information Processing Systems},
year = {1996},
pages = {1033-1039},
url = {https://mlanthology.org/neurips/1996/pareigis1996neurips-multigrid/}
}