Improving Exploration in UCT Using Local Manifolds

Abstract

Monte-Carlo planning has been proven successful in manysequential decision-making settings, but it suffers from poorexploration when the rewards are sparse. In this paper, weimprove exploration in UCT by generalizing across similarstates using a given distance metric. We show that this algorithm,like UCT, converges asymptotically to the optimalaction. When the state space does not have a natural distancemetric, we show how we can learn a local manifold from thetransition graph of states in the near future. to obtain a distancemetric. On domains inspired by video games, empiricalevidence shows that our algorithm is more sample efficientthan UCT, particularly when rewards are sparse.

Cite

Text

Srinivasan et al. "Improving Exploration in UCT Using Local Manifolds." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9660

Markdown

[Srinivasan et al. "Improving Exploration in UCT Using Local Manifolds." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/srinivasan2015aaai-improving/) doi:10.1609/AAAI.V29I1.9660

BibTeX

@inproceedings{srinivasan2015aaai-improving,
  title     = {{Improving Exploration in UCT Using Local Manifolds}},
  author    = {Srinivasan, Sriram and Talvitie, Erik and Bowling, Michael H.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {3386-3392},
  doi       = {10.1609/AAAI.V29I1.9660},
  url       = {https://mlanthology.org/aaai/2015/srinivasan2015aaai-improving/}
}