Counterfactual Online Learning for Open-Loop Monte-Carlo Planning
Abstract
Monte-Carlo Tree Search (MCTS) is a popular approach to online planning under uncertainty. While MCTS uses statistical sampling via multi-armed bandits to avoid exhaustive search in complex domains, common closed-loop approaches typically construct enormous search trees to consider a large number of potential observations and actions. On the other hand, open-loop approaches offer better memory efficiency by ignoring observations but are generally not competitive with closed-loop MCTS in terms of performance - even with commonly integrated human knowledge. In this paper, we propose Counterfactual Open-loop Reasoning with Ad hoc Learning (CORAL) for open-loop MCTS, using a causal multi-armed bandit approach with unobserved confounders (MABUC). CORAL consists of two online learning phases that are conducted during the open-loop search. In the first phase, observational values are learned based on preferred actions. In the second phase, counterfactual values are learned with MABUCs to make a decision via an intent policy obtained from the observational values. We evaluate CORAL in four POMDP benchmark scenarios and compare it with closed-loop and open-loop alternatives. In contrast to standard open-loop MCTS, CORAL achieves competitive performance compared with closed-loop algorithms while constructing significantly smaller search trees.
Cite
Text
Phan et al. "Counterfactual Online Learning for Open-Loop Monte-Carlo Planning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I25.34867Markdown
[Phan et al. "Counterfactual Online Learning for Open-Loop Monte-Carlo Planning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/phan2025aaai-counterfactual/) doi:10.1609/AAAI.V39I25.34867BibTeX
@inproceedings{phan2025aaai-counterfactual,
title = {{Counterfactual Online Learning for Open-Loop Monte-Carlo Planning}},
author = {Phan, Thomy and Chan, Shao-Hung and Koenig, Sven},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {26651-26658},
doi = {10.1609/AAAI.V39I25.34867},
url = {https://mlanthology.org/aaai/2025/phan2025aaai-counterfactual/}
}