Extending Q-Learning to General Adaptive Multi-Agent Systems
Abstract
Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian in- ference. Hyper-Q may be effective against many different types of adap- tive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to ex- act optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.
Cite
Text
Tesauro. "Extending Q-Learning to General Adaptive Multi-Agent Systems." Neural Information Processing Systems, 2003.Markdown
[Tesauro. "Extending Q-Learning to General Adaptive Multi-Agent Systems." Neural Information Processing Systems, 2003.](https://mlanthology.org/neurips/2003/tesauro2003neurips-extending/)BibTeX
@inproceedings{tesauro2003neurips-extending,
title = {{Extending Q-Learning to General Adaptive Multi-Agent Systems}},
author = {Tesauro, Gerald},
booktitle = {Neural Information Processing Systems},
year = {2003},
pages = {871-878},
url = {https://mlanthology.org/neurips/2003/tesauro2003neurips-extending/}
}