Single-Agent Policy Tree Search with Guarantees

Abstract

We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to provide an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for ``needle-in-a-haystack'' problems. The second algorithm, which is based on sampling, provides an upper bound on the expected number of nodes to be expanded before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.

Cite

Text

Orseau et al. "Single-Agent Policy Tree Search with Guarantees." Neural Information Processing Systems, 2018.

Markdown

[Orseau et al. "Single-Agent Policy Tree Search with Guarantees." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/orseau2018neurips-singleagent/)

BibTeX

@inproceedings{orseau2018neurips-singleagent,
  title     = {{Single-Agent Policy Tree Search with Guarantees}},
  author    = {Orseau, Laurent and Lelis, Levi and Lattimore, Tor and Weber, Theophane},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {3201-3211},
  url       = {https://mlanthology.org/neurips/2018/orseau2018neurips-singleagent/}
}