Learning to Cooperate via Policy Search

Abstract

Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.

Cite

Text

Peshkin et al. "Learning to Cooperate via Policy Search." Conference on Uncertainty in Artificial Intelligence, 2000.

Markdown

[Peshkin et al. "Learning to Cooperate via Policy Search." Conference on Uncertainty in Artificial Intelligence, 2000.](https://mlanthology.org/uai/2000/peshkin2000uai-learning/)

BibTeX

@inproceedings{peshkin2000uai-learning,
  title     = {{Learning to Cooperate via Policy Search}},
  author    = {Peshkin, Leonid and Kim, Kee-Eung and Meuleau, Nicolas and Kaelbling, Leslie Pack},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2000},
  pages     = {489-496},
  url       = {https://mlanthology.org/uai/2000/peshkin2000uai-learning/}
}