Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning

Abstract

Offline reinforcement learning (RL) is a powerful approach for data-driven decision-making and control. Compared to model-free methods, offline model-based reinforcement learning (MBRL) explicitly learns world models from a static dataset and uses them as surrogate simulators, improving the data efficiency and enabling the learned policy to potentially generalize beyond the dataset support. However, there could be various MDPs that behave identically on the offline dataset and dealing with the uncertainty about the true MDP can be challenging. In this paper, we propose modeling offline MBRL as a Bayes Adaptive Markov Decision Process (BAMDP), which is a principled framework for addressing model uncertainty. We further propose a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces with stochastic transitions. This planning process is based on Monte Carlo Tree Search and can be integrated into offline MBRL as a policy improvement operator in policy iteration. Our "RL + Search" framework follows in the footsteps of superhuman AIs like AlphaZero, improving on current offline MBRL methods by incorporating more computation input. The proposed algorithm significantly outperforms state-of-the-art offline RL methods on twelve D4RL MuJoCo tasks and three challenging, stochastic tokamak control tasks.

Cite

Text

Chen et al. "Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Chen et al. "Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-bayes/)

BibTeX

@inproceedings{chen2026iclr-bayes,
  title     = {{Bayes Adaptive Monte Carlo Tree Search for Offline Model-Based Reinforcement Learning}},
  author    = {Chen, Jiayu and Xu, Le and Chen, Wentse and Schneider, Jeff},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chen2026iclr-bayes/}
}