Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs

Abstract

In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in practice. This paper presents a model-free, scalable learning approach that synthesizes multi-agent reinforcement learning (MARL) and distributed constraint optimization (DCOP). By exploiting structured interaction in ND-POMDPs, our approach distributes the learning of the joint policy and employs DCOP techniques to coordinate distributed learning to ensure the global learning performance. Our approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability. Experimental results show that, with communication during learning and execution, our approach significantly outperforms the nearly-optimal non-communication policies computed offline.

Cite

Text

Zhang and Lesser. "Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs." AAAI Conference on Artificial Intelligence, 2011. doi:10.1609/AAAI.V25I1.7886

Markdown

[Zhang and Lesser. "Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs." AAAI Conference on Artificial Intelligence, 2011.](https://mlanthology.org/aaai/2011/zhang2011aaai-coordinated/) doi:10.1609/AAAI.V25I1.7886

BibTeX

@inproceedings{zhang2011aaai-coordinated,
  title     = {{Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs}},
  author    = {Zhang, Chongjie and Lesser, Victor R.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2011},
  pages     = {764-770},
  doi       = {10.1609/AAAI.V25I1.7886},
  url       = {https://mlanthology.org/aaai/2011/zhang2011aaai-coordinated/}
}