Limitations in Planning Ability in AlphaZero

Abstract

AlphaZero, a deep reinforcement learning algorithm, has achieved superhuman performance in complex games like Chess and Go. However, its strategic planning ability beyond winning games remains unclear. We investigated this using 4-in-a-row, a game used to study human planning. We analyzed AlphaZero's feature learning and puzzle-solving abilities. Despite strong gameplay, AlphaZero exhibited a 93% failure rate in puzzles. Our feature analysis showed that its self-learned strategies during training lacked certain critical human-like features. We added human-inspired cognitive value function into its policy and value outputs, leading to a 15% improvement in puzzle-solving accuracy. Our findings highlight the potential for human insights to enhance AI's strategic planning beyond self-play.

Cite

Text

Lin et al. "Limitations in Planning Ability in AlphaZero." NeurIPS 2024 Workshops: Behavioral_ML, 2024.

Markdown

[Lin et al. "Limitations in Planning Ability in AlphaZero." NeurIPS 2024 Workshops: Behavioral_ML, 2024.](https://mlanthology.org/neuripsw/2024/lin2024neuripsw-limitations/)

BibTeX

@inproceedings{lin2024neuripsw-limitations,
  title     = {{Limitations in Planning Ability in AlphaZero}},
  author    = {Lin, Daisy Xinlei and Lake, Brenden and Ma, Wei Ji},
  booktitle = {NeurIPS 2024 Workshops: Behavioral_ML},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/lin2024neuripsw-limitations/}
}