Extending Q-Learning to General Adaptive Multi-Agent Systems

Abstract

Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian in- ference. Hyper-Q may be effective against many different types of adap- tive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to ex- act optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.

Cite

Text

Tesauro. "Extending Q-Learning to General Adaptive Multi-Agent Systems." Neural Information Processing Systems, 2003.

Markdown

[Tesauro. "Extending Q-Learning to General Adaptive Multi-Agent Systems." Neural Information Processing Systems, 2003.](https://mlanthology.org/neurips/2003/tesauro2003neurips-extending/)

BibTeX

@inproceedings{tesauro2003neurips-extending,
  title     = {{Extending Q-Learning to General Adaptive Multi-Agent Systems}},
  author    = {Tesauro, Gerald},
  booktitle = {Neural Information Processing Systems},
  year      = {2003},
  pages     = {871-878},
  url       = {https://mlanthology.org/neurips/2003/tesauro2003neurips-extending/}
}