Learning in POMDPs with Monte Carlo Tree Search

Sammie Katt, Frans A. Oliehoek, Christopher Amato

ICML 2017 pp. 1819-1827

/icml/2017/katt2017icml-learning/

Abstract

The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.

PDF ICML Semantic Scholar

Cite

Text

Katt et al. "Learning in POMDPs with Monte Carlo Tree Search." International Conference on Machine Learning, 2017.

Markdown

[Katt et al. "Learning in POMDPs with Monte Carlo Tree Search." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/katt2017icml-learning/)

BibTeX

@inproceedings{katt2017icml-learning,
  title     = {{Learning in POMDPs with Monte Carlo Tree Search}},
  author    = {Katt, Sammie and Oliehoek, Frans A. and Amato, Christopher},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {1819-1827},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/katt2017icml-learning/}
}