Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Abstract

Semi-Markov Decision Problems are continuous time generaliza(cid:173) tions of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approxima(cid:173) tion. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose al(cid:173) gorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal con(cid:173) trol for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully ap(cid:173) plied.

Cite

Text

Bradtke and Duff. "Reinforcement Learning Methods for Continuous-Time Markov Decision Problems." Neural Information Processing Systems, 1994.

Markdown

[Bradtke and Duff. "Reinforcement Learning Methods for Continuous-Time Markov Decision Problems." Neural Information Processing Systems, 1994.](https://mlanthology.org/neurips/1994/bradtke1994neurips-reinforcement/)

BibTeX

@inproceedings{bradtke1994neurips-reinforcement,
  title     = {{Reinforcement Learning Methods for Continuous-Time Markov Decision Problems}},
  author    = {Bradtke, Steven J. and Duff, Michael O.},
  booktitle = {Neural Information Processing Systems},
  year      = {1994},
  pages     = {393-400},
  url       = {https://mlanthology.org/neurips/1994/bradtke1994neurips-reinforcement/}
}