Learning in Zero-Sum Team Markov Games Using Factored Value Functions

Abstract

We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents col- laborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned poli- cies can be executed in a distributed manner. The value function is rep- resented as a factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally inten- sive representations with extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The ap- proach is demonstrated with an example that shows the efficiency gains over naive enumeration.

Cite

Text

Lagoudakis and Parr. "Learning in Zero-Sum Team Markov Games Using Factored Value Functions." Neural Information Processing Systems, 2002.

Markdown

[Lagoudakis and Parr. "Learning in Zero-Sum Team Markov Games Using Factored Value Functions." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/lagoudakis2002neurips-learning/)

BibTeX

@inproceedings{lagoudakis2002neurips-learning,
  title     = {{Learning in Zero-Sum Team Markov Games Using Factored Value Functions}},
  author    = {Lagoudakis, Michail G. and Parr, Ronald},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {1659-1666},
  url       = {https://mlanthology.org/neurips/2002/lagoudakis2002neurips-learning/}
}