Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization
Abstract
Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state-occupancy measure regularization, which we call Volume-MCTS. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state-occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.
Cite
Text
Schramm and Boularias. "Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization." International Conference on Machine Learning, 2024.Markdown
[Schramm and Boularias. "Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/schramm2024icml-provably/)BibTeX
@inproceedings{schramm2024icml-provably,
title = {{Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization}},
author = {Schramm, Liam and Boularias, Abdeslam},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {43828-43861},
volume = {235},
url = {https://mlanthology.org/icml/2024/schramm2024icml-provably/}
}