Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents

Abstract

We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic responses of other agents, which is generally NP-hard. However, we show that no-regret dynamics can identify these worst-case responses in poly-time in smooth games. We propose a framework that uses this policy evaluation method for efficiently learning a robust principal policy using RL. This framework can be extended to provide robustness to boundedly rational agents too. Our motivating application is automated mechanism design: we empirically demonstrate our framework learns robust mechanisms in both matrix games and complex spatiotemporal games. In particular, we learn a dynamic tax policy that improves the welfare of a simulated trade-and-barter economy by 15%, even when facing previously unseen boundedly rational RL taxpayers.

Cite

Text

Zhao et al. "Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I10.26391

Markdown

[Zhao et al. "Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/zhao2023aaai-learning/) doi:10.1609/AAAI.V37I10.26391

BibTeX

@inproceedings{zhao2023aaai-learning,
  title     = {{Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents}},
  author    = {Zhao, Eric and Trott, Alexander R. and Xiong, Caiming and Zheng, Stephan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {11781-11789},
  doi       = {10.1609/AAAI.V37I10.26391},
  url       = {https://mlanthology.org/aaai/2023/zhao2023aaai-learning/}
}