Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Zhou, Chenhao; Shen, Zebang; Zhang, Chao; Zhao, Hanbin; Qian, Hui

doi:10.52202/079017-2192

Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Chenhao Zhou, Zebang Shen, Chao Zhang, Hanbin Zhao, Hui Qian

NeurIPS 2024

doi:10.52202/079017-2192 /neurips/2024/zhou2024neurips-solving/

Abstract

In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization (\SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function. We explicitly analyze the approximation error in policy evaluation, and show that \SDEPO\ achieves an $\tilde{O}(\frac{1}{(1-\gamma)^3\epsilon})$ last-iterate convergence to the $\epsilon-$optimal Nash equilibrium, which is independent of the cardinality of the state space. The complexity result matches the best-known results for global convergence of policy gradient algorithms for single agent setting. Moreover, we also propose a practical variant of \SDEPO\ to deal with continuous action space and empirical results demonstrate the practical superiority of the proposed method.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhou et al. "Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding." Neural Information Processing Systems, 2024. doi:10.52202/079017-2192

Markdown

[Zhou et al. "Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhou2024neurips-solving/) doi:10.52202/079017-2192

BibTeX

@inproceedings{zhou2024neurips-solving,
  title     = {{Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding}},
  author    = {Zhou, Chenhao and Shen, Zebang and Zhang, Chao and Zhao, Hanbin and Qian, Hui},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2192},
  url       = {https://mlanthology.org/neurips/2024/zhou2024neurips-solving/}
}