Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization

Abstract

Monte Carlo tree search (MCTS) has been successful in a variety of domains, but faces challenges with long-horizon exploration when compared to sampling-based motion planning algorithms like Rapidly-Exploring Random Trees. To address these limitations of MCTS, we derive a tree search algorithm based on policy optimization with state-occupancy measure regularization, which we call Volume-MCTS. We show that count-based exploration and sampling-based motion planning can be derived as approximate solutions to this state-occupancy measure regularized objective. We test our method on several robot navigation problems, and find that Volume-MCTS outperforms AlphaZero and displays significantly better long-horizon exploration properties.

Cite

Text

Schramm and Boularias. "Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization." International Conference on Machine Learning, 2024.

Markdown

[Schramm and Boularias. "Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/schramm2024icml-provably/)

BibTeX

@inproceedings{schramm2024icml-provably,
  title     = {{Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search Through State Occupancy Regularization}},
  author    = {Schramm, Liam and Boularias, Abdeslam},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {43828-43861},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/schramm2024icml-provably/}
}