Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes
Abstract
We consider the problem of "optimal learning" for Markov decision processes with uncertain transition probabilities. Motivated by the correspondence between these processes and partially-observable Markov decision processes, we adopt policies expressed as finite-state stochastic automata, and we propose policy improvement algorithms that utilize Monte-Carlo techniques for gradient estimation and ascent.
Cite
Text
Duff. "Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes." Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.Markdown
[Duff. "Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes." Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.](https://mlanthology.org/aistats/2001/duff2001aistats-montecarlo/)BibTeX
@inproceedings{duff2001aistats-montecarlo,
title = {{Monte-Carlo Algorithms for the Improvement of Finite-State Stochastic Controllers: Application to Bayes-Adaptive Markov Decision Processes}},
author = {Duff, Michael O.},
booktitle = {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics},
year = {2001},
pages = {93-97},
volume = {R3},
url = {https://mlanthology.org/aistats/2001/duff2001aistats-montecarlo/}
}