Cooperative Multi-Agent Policy Gradient

Bono, Guillaume; Dibangoye, Jilles Steeve; Matignon, Laëtitia; Pereyron, Florian; Simonin, Olivier

doi:10.1007/978-3-030-10925-7_28

Cooperative Multi-Agent Policy Gradient

Guillaume Bono, Jilles Steeve Dibangoye, Laëtitia Matignon, Florian Pereyron, Olivier Simonin

ECML-PKDD 2018 pp. 459-476

doi:10.1007/978-3-030-10925-7_28 /ecmlpkdd/2018/bono2018ecmlpkdd-cooperative/

Abstract

Reinforcement Learning (RL) for decentralized partially observable Markov decision processes (Dec-POMDPs) is lagging behind the spectacular breakthroughs of single-agent RL. That is because assumptions that hold in single-agent settings are often obsolete in decentralized multi-agent systems. To tackle this issue, we investigate the foundations of policy gradient methods within the centralized training for decentralized control (CTDC) paradigm. In this paradigm, learning can be accomplished in a centralized manner while execution can still be independent. Using this insight, we establish policy gradient theorem and compatible function approximations for decentralized multi-agent systems. Resulting actor-critic methods preserve the decentralized control at the execution phase, but can also estimate the policy gradient from collective experiences guided by a centralized critic at the training phase. Experiments demonstrate our policy gradient methods compare favorably against standard RL techniques in benchmarks from the literature. Code related to this paper is available at: https://gitlab.inria.fr/gbono/coop-ma-pg .

PDF ECML-PKDD Semantic Scholar

Cite

Text

Bono et al. "Cooperative Multi-Agent Policy Gradient." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018. doi:10.1007/978-3-030-10925-7_28

Markdown

[Bono et al. "Cooperative Multi-Agent Policy Gradient." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018.](https://mlanthology.org/ecmlpkdd/2018/bono2018ecmlpkdd-cooperative/) doi:10.1007/978-3-030-10925-7_28

BibTeX

@inproceedings{bono2018ecmlpkdd-cooperative,
  title     = {{Cooperative Multi-Agent Policy Gradient}},
  author    = {Bono, Guillaume and Dibangoye, Jilles Steeve and Matignon, Laëtitia and Pereyron, Florian and Simonin, Olivier},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2018},
  pages     = {459-476},
  doi       = {10.1007/978-3-030-10925-7_28},
  url       = {https://mlanthology.org/ecmlpkdd/2018/bono2018ecmlpkdd-cooperative/}
}