Policy Space Diversity for Non-Transitive Games
Abstract
Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness with existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving of PSRO, we obtain a new PSRO variant, \textit{Policy Space Diversity} PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on single-state games, Leduc, and Goofspiel demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants.
Cite
Text
Yao et al. "Policy Space Diversity for Non-Transitive Games." Neural Information Processing Systems, 2023.Markdown
[Yao et al. "Policy Space Diversity for Non-Transitive Games." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/yao2023neurips-policy/)BibTeX
@inproceedings{yao2023neurips-policy,
title = {{Policy Space Diversity for Non-Transitive Games}},
author = {Yao, Jian and Liu, Weiming and Fu, Haobo and Yang, Yaodong and McAleer, Stephen and Fu, Qiang and Yang, Wei},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/yao2023neurips-policy/}
}