Learning Partial Policies to Speedup MDP Tree Search

Abstract

A popular approach for online decision making in large MDPs is time-bounded tree search. The effectiveness of tree search, however, is largely influenced by the action branching factor, which limits the search depth given a time bound. An obvious way to reduce action branching is to consider only a subset of potentially good ac-tions at each state as specified by a provided partial policy. In this work, we consider offline learning of such partial policies with the goal of speeding up search without significantly reduc-ing decision-making quality. Our first contribu-tion is to study learning algorithms based on re-ducing our learning problem to i.i.d. supervised learning. We give a reduction-style analysis of three such algorithms, each making different as-sumptions, which relates the supervised learning objectives to the sub-optimality of search using the learned partial policies. Our second contribu-tion is to describe concrete implementations of the algorithms within the popular framework of Monte-Carlo tree search. Finally, the third con-tribution is to evaluate the learning algorithms in two challenging MDPs with large action branch-ing factors, showing that the learned partial poli-cies can significantly improve the anytime per-formance of Monte-Carlo tree search. 1

Cite

Text

Pinto and Fern. "Learning Partial Policies to Speedup MDP Tree Search." Conference on Uncertainty in Artificial Intelligence, 2014.

Markdown

[Pinto and Fern. "Learning Partial Policies to Speedup MDP Tree Search." Conference on Uncertainty in Artificial Intelligence, 2014.](https://mlanthology.org/uai/2014/pinto2014uai-learning/)

BibTeX

@inproceedings{pinto2014uai-learning,
  title     = {{Learning Partial Policies to Speedup MDP Tree Search}},
  author    = {Pinto, Jervis and Fern, Alan},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2014},
  pages     = {672-681},
  url       = {https://mlanthology.org/uai/2014/pinto2014uai-learning/}
}