Bandit Based Monte-Carlo Planning

Kocsis, Levente; Szepesvári, Csaba

doi:10.1007/11871842_29

Bandit Based Monte-Carlo Planning

Levente Kocsis, Csaba Szepesvári

ECML-PKDD 2006 pp. 282-293

doi:10.1007/11871842_29 /ecmlpkdd/2006/kocsis2006ecml-bandit/

Abstract

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Kocsis and Szepesvári. "Bandit Based Monte-Carlo Planning." European Conference on Machine Learning, 2006. doi:10.1007/11871842_29

Markdown

[Kocsis and Szepesvári. "Bandit Based Monte-Carlo Planning." European Conference on Machine Learning, 2006.](https://mlanthology.org/ecmlpkdd/2006/kocsis2006ecml-bandit/) doi:10.1007/11871842_29

BibTeX

@inproceedings{kocsis2006ecml-bandit,
  title     = {{Bandit Based Monte-Carlo Planning}},
  author    = {Kocsis, Levente and Szepesvári, Csaba},
  booktitle = {European Conference on Machine Learning},
  year      = {2006},
  pages     = {282-293},
  doi       = {10.1007/11871842_29},
  url       = {https://mlanthology.org/ecmlpkdd/2006/kocsis2006ecml-bandit/}
}