Generalized Weighted Path Consistency for Mastering Atari Games

Abstract

Reinforcement learning with the help of neural-guided search consumes huge computational resources to achieve remarkable performance. Path consistency (PC), i.e., $f$ values on one optimal path should be identical, was previously imposed on MCTS by PCZero to improve the learning efficiency of AlphaZero. Not only PCZero still lacks a theoretical support but also considers merely board games. In this paper, PCZero is generalized into GW-PCZero for real applications with non-zero immediate reward. A weighting mechanism is introduced to reduce the variance caused by scouting's uncertainty on the $f$ value estimation. For the first time, it is theoretically proved that neural-guided MCTS is guaranteed to find the optimal solution under the constraint of PC. Experiments are conducted on the Atari $100$k benchmark with $26$ games and GW-PCZero achieves $198\%$ mean human performance, higher than the state-of-the-art EfficientZero's $194\\%$, while consuming only $25\\%$ of the computational resources consumed by EfficientZero.

Cite

Text

Zhao et al. "Generalized Weighted Path Consistency for Mastering Atari Games." Neural Information Processing Systems, 2023.

Markdown

[Zhao et al. "Generalized Weighted Path Consistency for Mastering Atari Games." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/zhao2023neurips-generalized/)

BibTeX

@inproceedings{zhao2023neurips-generalized,
  title     = {{Generalized Weighted Path Consistency for Mastering Atari Games}},
  author    = {Zhao, Dengwei and Tu, Shikui and Xu, Lei},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/zhao2023neurips-generalized/}
}