AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)

Du, Kevin; Gemp, Ian; Wu, Yi; Wu, Yingying

doi:10.1609/AAAI.V37I13.26962

AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)

Kevin Du, Ian Gemp, Yi Wu, Yingying Wu

AAAI 2023 pp. 16204-16205

doi:10.1609/AAAI.V37I13.26962 /aaai/2023/du2023aaai-alphasnake/

Abstract

Reinforcement learning has been used to approach well-known NP-hard combinatorial problems in graph theory. Among these, Hamiltonian cycle problems are exceptionally difficult to analyze, even when restricted to individual instances of structurally complex graphs. In this paper, we use Monte Carlo Tree Search (MCTS), the search algorithm behind many state-of-the-art reinforcement learning algorithms such as AlphaZero, to create autonomous agents that learn to play the game of Snake, a game centered on properties of Hamiltonian cycles on grid graphs. The game of Snake can be formulated as a single-player discounted Markov Decision Process (MDP), where the agent must behave optimally in a stochastic environment. Determining the optimal policy for Snake, defined as the policy that maximizes the probability of winning -- or win rate -- with higher priority and minimizes the expected number of time steps to win with lower priority, is conjectured to be NP-hard. Performance-wise, compared to prior work in the Snake game, our algorithm is the first to achieve a win rate over 0.5 (a uniform random policy achieves a win rate < 2.57 x 10^-15), demonstrating the versatility of AlphaZero in tackling NP-hard problems.

PDF AAAI Semantic Scholar

Cite

Text

Du et al. "AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I13.26962

Markdown

[Du et al. "AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/du2023aaai-alphasnake/) doi:10.1609/AAAI.V37I13.26962

BibTeX

@inproceedings{du2023aaai-alphasnake,
  title     = {{AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)}},
  author    = {Du, Kevin and Gemp, Ian and Wu, Yi and Wu, Yingying},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {16204-16205},
  doi       = {10.1609/AAAI.V37I13.26962},
  url       = {https://mlanthology.org/aaai/2023/du2023aaai-alphasnake/}
}