Reinforcement Learning Methods for Continuous-Time Markov Decision Problems
Abstract
Semi-Markov Decision Problems are continuous time generaliza(cid:173) tions of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approxima(cid:173) tion. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose al(cid:173) gorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal con(cid:173) trol for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully ap(cid:173) plied.
Cite
Text
Bradtke and Duff. "Reinforcement Learning Methods for Continuous-Time Markov Decision Problems." Neural Information Processing Systems, 1994.Markdown
[Bradtke and Duff. "Reinforcement Learning Methods for Continuous-Time Markov Decision Problems." Neural Information Processing Systems, 1994.](https://mlanthology.org/neurips/1994/bradtke1994neurips-reinforcement/)BibTeX
@inproceedings{bradtke1994neurips-reinforcement,
title = {{Reinforcement Learning Methods for Continuous-Time Markov Decision Problems}},
author = {Bradtke, Steven J. and Duff, Michael O.},
booktitle = {Neural Information Processing Systems},
year = {1994},
pages = {393-400},
url = {https://mlanthology.org/neurips/1994/bradtke1994neurips-reinforcement/}
}