Difference Advantage Estimation for Multi-Agent Policy Gradients
Abstract
Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.
Cite
Text
Li et al. "Difference Advantage Estimation for Multi-Agent Policy Gradients." International Conference on Machine Learning, 2022.Markdown
[Li et al. "Difference Advantage Estimation for Multi-Agent Policy Gradients." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/li2022icml-difference/)BibTeX
@inproceedings{li2022icml-difference,
title = {{Difference Advantage Estimation for Multi-Agent Policy Gradients}},
author = {Li, Yueheng and Xie, Guangming and Lu, Zongqing},
booktitle = {International Conference on Machine Learning},
year = {2022},
pages = {13066-13085},
volume = {162},
url = {https://mlanthology.org/icml/2022/li2022icml-difference/}
}