Converging to Unexploitable Policies in Continuous Control Adversarial Games

Abstract

Fictitious Self-Play (FSP) is an iterative algorithm capable of learning approximate Nash equilibria in many types of two-player zero-sum games. In FSP, at each iteration, a best response is learned to the opponent's meta strategy. However, FSP can be slow to converge in continuous control games in which two embodied agents compete against one another. We propose Adaptive FSP (AdaptFSP), a deep reinforcement learning (RL) algorithm inspired by FSP. The main idea is that instead of training a best response only against the meta strategy, we additionally train against an adaptive deep RL agent that can adapt to the best response. In four test domains, two tabular cases--random normal-form matrix games, Leduc poker--and two continuous control tasks--Thou Shall Not Pass and a soccer environment--we show that AdaptFSP achieves lower exploitability more quickly than vanilla FSP.

Cite

Text

Goldstein and Brown. "Converging to Unexploitable Policies in Continuous Control Adversarial Games." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Goldstein and Brown. "Converging to Unexploitable Policies in Continuous Control Adversarial Games." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/goldstein2022neuripsw-converging/)

BibTeX

@inproceedings{goldstein2022neuripsw-converging,
  title     = {{Converging to Unexploitable Policies in Continuous Control Adversarial Games}},
  author    = {Goldstein, Maxwell and Brown, Noam},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/goldstein2022neuripsw-converging/}
}