Model-Free Opponent Shaping

Abstract

In general-sum games the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner’s dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), directly shape the learning process of their opponents. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent’s differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying game. The meta-state consists of the policies in the underlying game and the meta-policy produces a new policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known ZD extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.

Cite

Text

Lu et al. "Model-Free Opponent Shaping." International Conference on Machine Learning, 2022.

Markdown

[Lu et al. "Model-Free Opponent Shaping." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/lu2022icml-modelfree/)

BibTeX

@inproceedings{lu2022icml-modelfree,
  title     = {{Model-Free Opponent Shaping}},
  author    = {Lu, Christopher and Willi, Timon and De Witt, Christian A Schroeder and Foerster, Jakob},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {14398-14411},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/lu2022icml-modelfree/}
}