Provably Efficient Offline Multi-Agent Reinforcement Learning via Strategy-Wise Bonus

Cui, Qiwen; Du, Simon S

Provably Efficient Offline Multi-Agent Reinforcement Learning via Strategy-Wise Bonus

NeurIPS 2022

/neurips/2022/cui2022neurips-provably/

Abstract

This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle which builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^m A_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space $\Pi_{i=1}^m A_i$ due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class $\Pi$ as input and output a strategy that is close to the best strategy in $\Pi$. In this setting, the sample complexity only scales with $\log |\Pi|$ instead of $\sum_{i=1}^m A_i$.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Cui and Du. "Provably Efficient Offline Multi-Agent Reinforcement Learning via Strategy-Wise Bonus." Neural Information Processing Systems, 2022.

Markdown

[Cui and Du. "Provably Efficient Offline Multi-Agent Reinforcement Learning via Strategy-Wise Bonus." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/cui2022neurips-provably/)

BibTeX

@inproceedings{cui2022neurips-provably,
  title     = {{Provably Efficient Offline Multi-Agent Reinforcement Learning via Strategy-Wise Bonus}},
  author    = {Cui, Qiwen and Du, Simon S},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/cui2022neurips-provably/}
}