Learning Partial Policies to Speedup MDP Tree Search
Abstract
A popular approach for online decision making in large MDPs is time-bounded tree search. The effectiveness of tree search, however, is largely influenced by the action branching factor, which limits the search depth given a time bound. An obvious way to reduce action branching is to consider only a subset of potentially good ac-tions at each state as specified by a provided partial policy. In this work, we consider offline learning of such partial policies with the goal of speeding up search without significantly reduc-ing decision-making quality. Our first contribu-tion is to study learning algorithms based on re-ducing our learning problem to i.i.d. supervised learning. We give a reduction-style analysis of three such algorithms, each making different as-sumptions, which relates the supervised learning objectives to the sub-optimality of search using the learned partial policies. Our second contribu-tion is to describe concrete implementations of the algorithms within the popular framework of Monte-Carlo tree search. Finally, the third con-tribution is to evaluate the learning algorithms in two challenging MDPs with large action branch-ing factors, showing that the learned partial poli-cies can significantly improve the anytime per-formance of Monte-Carlo tree search. 1
Cite
Text
Pinto and Fern. "Learning Partial Policies to Speedup MDP Tree Search." Conference on Uncertainty in Artificial Intelligence, 2014.Markdown
[Pinto and Fern. "Learning Partial Policies to Speedup MDP Tree Search." Conference on Uncertainty in Artificial Intelligence, 2014.](https://mlanthology.org/uai/2014/pinto2014uai-learning/)BibTeX
@inproceedings{pinto2014uai-learning,
title = {{Learning Partial Policies to Speedup MDP Tree Search}},
author = {Pinto, Jervis and Fern, Alan},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2014},
pages = {672-681},
url = {https://mlanthology.org/uai/2014/pinto2014uai-learning/}
}