Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs
Abstract
In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in practice. This paper presents a model-free, scalable learning approach that synthesizes multi-agent reinforcement learning (MARL) and distributed constraint optimization (DCOP). By exploiting structured interaction in ND-POMDPs, our approach distributes the learning of the joint policy and employs DCOP techniques to coordinate distributed learning to ensure the global learning performance. Our approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability. Experimental results show that, with communication during learning and execution, our approach significantly outperforms the nearly-optimal non-communication policies computed offline.
Cite
Text
Zhang and Lesser. "Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs." AAAI Conference on Artificial Intelligence, 2011. doi:10.1609/AAAI.V25I1.7886Markdown
[Zhang and Lesser. "Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs." AAAI Conference on Artificial Intelligence, 2011.](https://mlanthology.org/aaai/2011/zhang2011aaai-coordinated/) doi:10.1609/AAAI.V25I1.7886BibTeX
@inproceedings{zhang2011aaai-coordinated,
title = {{Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs}},
author = {Zhang, Chongjie and Lesser, Victor R.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2011},
pages = {764-770},
doi = {10.1609/AAAI.V25I1.7886},
url = {https://mlanthology.org/aaai/2011/zhang2011aaai-coordinated/}
}