AlphaZero-Based Proof Cost Network to Aid Game Solving

Abstract

The AlphaZero algorithm learns and plays games without hand-crafted expert knowledge. However, since its objective is to play well, we hypothesize that a better objective can be defined for the related but separate task of solving games. This paper proposes a novel approach to solving problems by modifying the training target of the AlphaZero algorithm, such that it prioritizes solving the game quickly, rather than winning. We train a Proof Cost Network (PCN), where proof cost is a heuristic that estimates the amount of work required to solve problems. This matches the general concept of the so-called proof number from proof number search, which has been shown to be well-suited for game solving. We propose two specific training targets. The first finds the shortest path to a solution, while the second estimates the proof cost. We conduct experiments on solving 15x15 Gomoku and 9x9 Killall-Go problems with both MCTS-based and FDFPN solvers. Comparisons between using AlphaZero networks and PCN as heuristics show that PCN can solve more problems.

Cite

Text

Wu et al. "AlphaZero-Based Proof Cost Network to Aid Game Solving." International Conference on Learning Representations, 2022.

Markdown

[Wu et al. "AlphaZero-Based Proof Cost Network to Aid Game Solving." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/wu2022iclr-alphazerobased/)

BibTeX

@inproceedings{wu2022iclr-alphazerobased,
  title     = {{AlphaZero-Based Proof Cost Network to Aid Game Solving}},
  author    = {Wu, Ti-Rong and Shih, Chung-Chin and Wei, Ting Han and Tsai, Meng-Yu and Hsu, Wei-Yuan and Wu, I-Chen},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/wu2022iclr-alphazerobased/}
}