Improving Exploration in UCT Using Local Manifolds
Abstract
Monte-Carlo planning has been proven successful in manysequential decision-making settings, but it suffers from poorexploration when the rewards are sparse. In this paper, weimprove exploration in UCT by generalizing across similarstates using a given distance metric. We show that this algorithm,like UCT, converges asymptotically to the optimalaction. When the state space does not have a natural distancemetric, we show how we can learn a local manifold from thetransition graph of states in the near future. to obtain a distancemetric. On domains inspired by video games, empiricalevidence shows that our algorithm is more sample efficientthan UCT, particularly when rewards are sparse.
Cite
Text
Srinivasan et al. "Improving Exploration in UCT Using Local Manifolds." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9660Markdown
[Srinivasan et al. "Improving Exploration in UCT Using Local Manifolds." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/srinivasan2015aaai-improving/) doi:10.1609/AAAI.V29I1.9660BibTeX
@inproceedings{srinivasan2015aaai-improving,
title = {{Improving Exploration in UCT Using Local Manifolds}},
author = {Srinivasan, Sriram and Talvitie, Erik and Bowling, Michael H.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2015},
pages = {3386-3392},
doi = {10.1609/AAAI.V29I1.9660},
url = {https://mlanthology.org/aaai/2015/srinivasan2015aaai-improving/}
}