Markov Games as a Framework for Multi-Agent Reinforcement Learning
Abstract
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsis-tic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic.
Cite
Text
Littman. "Markov Games as a Framework for Multi-Agent Reinforcement Learning." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50027-1Markdown
[Littman. "Markov Games as a Framework for Multi-Agent Reinforcement Learning." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/littman1994icml-markov/) doi:10.1016/B978-1-55860-335-6.50027-1BibTeX
@inproceedings{littman1994icml-markov,
title = {{Markov Games as a Framework for Multi-Agent Reinforcement Learning}},
author = {Littman, Michael L.},
booktitle = {International Conference on Machine Learning},
year = {1994},
pages = {157-163},
doi = {10.1016/B978-1-55860-335-6.50027-1},
url = {https://mlanthology.org/icml/1994/littman1994icml-markov/}
}