Learning in Zero-Sum Team Markov Games Using Factored Value Functions
Abstract
We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents col- laborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned poli- cies can be executed in a distributed manner. The value function is rep- resented as a factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally inten- sive representations with extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The ap- proach is demonstrated with an example that shows the efficiency gains over naive enumeration.
Cite
Text
Lagoudakis and Parr. "Learning in Zero-Sum Team Markov Games Using Factored Value Functions." Neural Information Processing Systems, 2002.Markdown
[Lagoudakis and Parr. "Learning in Zero-Sum Team Markov Games Using Factored Value Functions." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/lagoudakis2002neurips-learning/)BibTeX
@inproceedings{lagoudakis2002neurips-learning,
title = {{Learning in Zero-Sum Team Markov Games Using Factored Value Functions}},
author = {Lagoudakis, Michail G. and Parr, Ronald},
booktitle = {Neural Information Processing Systems},
year = {2002},
pages = {1659-1666},
url = {https://mlanthology.org/neurips/2002/lagoudakis2002neurips-learning/}
}