Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Abstract

In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization (\SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function. We explicitly analyze the approximation error in policy evaluation, and show that \SDEPO\ achieves an $\tilde{O}(\frac{1}{(1-\gamma)^3\epsilon})$ last-iterate convergence to the $\epsilon-$optimal Nash equilibrium, which is independent of the cardinality of the state space. The complexity result matches the best-known results for global convergence of policy gradient algorithms for single agent setting. Moreover, we also propose a practical variant of \SDEPO\ to deal with continuous action space and empirical results demonstrate the practical superiority of the proposed method.

Cite

Text

Zhou et al. "Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding." Neural Information Processing Systems, 2024. doi:10.52202/079017-2192

Markdown

[Zhou et al. "Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/zhou2024neurips-solving/) doi:10.52202/079017-2192

BibTeX

@inproceedings{zhou2024neurips-solving,
  title     = {{Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding}},
  author    = {Zhou, Chenhao and Shen, Zebang and Zhang, Chao and Zhao, Hanbin and Qian, Hui},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2192},
  url       = {https://mlanthology.org/neurips/2024/zhou2024neurips-solving/}
}