Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes

Abstract

We consider the problem of "optimal learning" for Markov decision processes with uncertain transition probabilities. Motivated by the correspondence between these processes and partially-observable Markov decision processes, we adopt policies expressed as finite-state stochastic automata, and we propose policy improvement algorithms that utilize Monte-Carlo techniques for gradient estimation and ascent.

Cite

Text

Duff. "Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes." Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.

Markdown

[Duff. "Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes." Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.](https://mlanthology.org/aistats/2001/duff2001aistats-montecarlo/)

BibTeX

@inproceedings{duff2001aistats-montecarlo,
  title     = {{Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes}},
  author    = {Duff, Michael O.},
  booktitle = {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics},
  year      = {2001},
  pages     = {93-97},
  volume    = {R3},
  url       = {https://mlanthology.org/aistats/2001/duff2001aistats-montecarlo/}
}