Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games
Abstract
Learning algorithms often obtain relatively low average payoffs in repeated general-sum games between other learning agents due to a focus on myopic best-response and one-shot Nash equilibrium (NE) strategies. A less myopic approach places focus on NEs of the repeated game, which suggests that (at the least) a learning agent should possess two properties. First, an agent should never learn to play a strategy that produces average payoffs less than the minimax value of the game. Second, an agent should learn to cooperate/compromise when beneficial. No learning algorithm from the literature is known to possess both of these properties. We present a reinforcement learning algorithm (M-Qubed) that provably satisfies the first property and empirically displays (in self play) the second property in a wide range of games.
Cite
Text
Crandall and Goodrich. "Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games." International Conference on Machine Learning, 2005. doi:10.1145/1102351.1102372Markdown
[Crandall and Goodrich. "Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games." International Conference on Machine Learning, 2005.](https://mlanthology.org/icml/2005/crandall2005icml-learning/) doi:10.1145/1102351.1102372BibTeX
@inproceedings{crandall2005icml-learning,
title = {{Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games}},
author = {Crandall, Jacob W. and Goodrich, Michael A.},
booktitle = {International Conference on Machine Learning},
year = {2005},
pages = {161-168},
doi = {10.1145/1102351.1102372},
url = {https://mlanthology.org/icml/2005/crandall2005icml-learning/}
}