Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games

Abstract

Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games.

Cite

Text

Crandall and Goodrich. "Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games." International Conference on Machine Learning, 2005. doi:10.1145/1102351.1102372

Markdown

[Crandall and Goodrich. "Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games." International Conference on Machine Learning, 2005.](https://mlanthology.org/icml/2005/crandall2005icml-learning/) doi:10.1145/1102351.1102372

BibTeX

@inproceedings{crandall2005icml-learning,
  title     = {{Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games}},
  author    = {Crandall, Jacob W. and Goodrich, Michael A.},
  booktitle = {International Conference on Machine Learning},
  year      = {2005},
  pages     = {161-168},
  doi       = {10.1145/1102351.1102372},
  url       = {https://mlanthology.org/icml/2005/crandall2005icml-learning/}
}