Power Mean Estimation in Stochastic Monte-Carlo Tree Search
Abstract
Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search. Its success relies on the Upper Confidence bound for Trees (UCT) algorithm, an extension of the UCB method for multi-arm bandits. However, the theoretical foundation of UCT is incomplete due to an error in the logarithmic bonus term for action selection, leading to the development of Fixed-Depth-MCTS with a polynomial exploration bonus to balance exploration and exploitation. Both UCT and Fixed-Depth-MCTS suffer from biased value estimation: the weighted sum underestimates the optimal value, while the maximum valuation overestimates it. The power mean estimator offers a balanced solution, lying between the average and maximum values. Power-UCT incorporates this estimator for more accurate value estimates, but its theoretical proof remains incomplete. This paper introduces Stochastic-Power-UCT, an MCTS algorithm for stochastic MDPs using the power mean estimator. We analyze its polynomial convergence in estimating root node values and show that it shares the same convergence rate of $\mathcal{O}(n^{-1/2})$, with $n$ is the number of visited trajectories, as Fixed-Depth-MCTS, with the latter being a special case of the former. Our theoretical results are validated with empirical tests across various stochastic MDP environments.
Cite
Text
Dam et al. "Power Mean Estimation in Stochastic Monte-Carlo Tree Search." ICML 2024 Workshops: RLControlTheory, 2024.Markdown
[Dam et al. "Power Mean Estimation in Stochastic Monte-Carlo Tree Search." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/dam2024icmlw-power/)BibTeX
@inproceedings{dam2024icmlw-power,
title = {{Power Mean Estimation in Stochastic Monte-Carlo Tree Search}},
author = {Dam, Tuan Quang and Maillard, Odalric-Ambrym and Kaufmann, Emilie},
booktitle = {ICML 2024 Workshops: RLControlTheory},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/dam2024icmlw-power/}
}