Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Abstract

We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games. Combining reward randomization and policy gradient, we derive a new algorithm, Reward-Randomized Policy Gradient (RPG). RPG is able to discover a set of multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas, including grid-world games and a real-world game Agar.io, where multiple equilibria exist but standard multi-agent policy gradient algorithms always converge to a fixed one with a sub-optimal payoff for every player even using state-of-the-art exploration techniques. Furthermore, with the set of diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning the best policy from the set; and (2) obtain an adaptive agent by using this set of strategies as its training opponents.

Cite

Text

Tang et al. "Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization." International Conference on Learning Representations, 2021.

Markdown

[Tang et al. "Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/tang2021iclr-discovering/)

BibTeX

@inproceedings{tang2021iclr-discovering,
  title     = {{Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization}},
  author    = {Tang, Zhenggang and Yu, Chao and Chen, Boyuan and Xu, Huazhe and Wang, Xiaolong and Fang, Fei and Du, Simon Shaolei and Wang, Yu and Wu, Yi},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/tang2021iclr-discovering/}
}