Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

Abstract

We study multi-player general-sum Markov games with one of the players designated as the leader and the rest regarded as the followers. In particular, we focus on the class of games where the followers are myopic, i.e., the followers aim to maximize the instantaneous rewards. For such a game, our goal is to find the Stackelberg-Nash equilibrium (SNE), which is a policy pair $(\pi^*, \nu^*)$ such that (i) $\pi^*$ is the optimal policy for the leader when the followers always play their best response, and (ii) $\nu^*$ is the best response policy of the followers, which is a Nash equilibrium of the followers' game induced by $\pi^*$. We develop sample efficient reinforcement learning (RL) algorithms for solving SNE under both the online and offline settings. Respectively, our algorithms are optimistic and pessimistic variants of least-squares value iteration and are readily able to incorporate function approximation tools for handling large state spaces. Furthermore, for the case with linear function approximation, we prove that our algorithms achieve sublinear regret and suboptimality under online and offline setups respectively. To our best knowledge, we establish the first provably efficient RL algorithms for solving SNE in general-sum Markov games with myopic followers.

Cite

Text

Zhong et al. "Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?." ICLR 2022 Workshops: GMS, 2022.

Markdown

[Zhong et al. "Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/zhong2022iclrw-reinforcement/)

BibTeX

@inproceedings{zhong2022iclrw-reinforcement,
  title     = {{Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?}},
  author    = {Zhong, Han and Yang, Zhuoran and Wang, Zhaoran and Jordan, Michael},
  booktitle = {ICLR 2022 Workshops: GMS},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/zhong2022iclrw-reinforcement/}
}