Bootstrapping from Game Tree Search

Abstract

In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuels checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function. After initialising its weight vector to small random values, Meep was able to learn high quality weights from self-play alone. When tested online against human opponents, Meep played at a master level, the best performance of any chess program with a heuristic learned entirely from self-play.

Cite

Text

Veness et al. "Bootstrapping from Game Tree Search." Neural Information Processing Systems, 2009.

Markdown

[Veness et al. "Bootstrapping from Game Tree Search." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/veness2009neurips-bootstrapping/)

BibTeX

@inproceedings{veness2009neurips-bootstrapping,
  title     = {{Bootstrapping from Game Tree Search}},
  author    = {Veness, Joel and Silver, David and Blair, Alan and Uther, William},
  booktitle = {Neural Information Processing Systems},
  year      = {2009},
  pages     = {1937-1945},
  url       = {https://mlanthology.org/neurips/2009/veness2009neurips-bootstrapping/}
}