Counterfactual Multi-Agent Policy Gradients
Abstract
Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
Cite
Text
Foerster et al. "Counterfactual Multi-Agent Policy Gradients." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11794Markdown
[Foerster et al. "Counterfactual Multi-Agent Policy Gradients." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/foerster2018aaai-counterfactual/) doi:10.1609/AAAI.V32I1.11794BibTeX
@inproceedings{foerster2018aaai-counterfactual,
title = {{Counterfactual Multi-Agent Policy Gradients}},
author = {Foerster, Jakob N. and Farquhar, Gregory and Afouras, Triantafyllos and Nardelli, Nantas and Whiteson, Shimon},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2018},
pages = {2974-2982},
doi = {10.1609/AAAI.V32I1.11794},
url = {https://mlanthology.org/aaai/2018/foerster2018aaai-counterfactual/}
}