R-MADDPG for Partially Observable Environments and Limited Communication

Abstract

There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among multiple agents such as self-driving cars or autonomous delivery drones. Real-world conditions are a challenging environment for multiagent systems due to the environment's partially observable, nonstationary nature. Moreover, if agents must share a limited resource (e.g., communication network bandwidth) they must all learn how to coordinate resource use. These aspects make learning very challenging. This paper introduces a deep recurrent multiagent actor-critic framework for handling multiagent coordination under partial observable settings and limited communication. We investigate the recurrency effects on the performance and communication use of a team of agents, and demonstrate that the resulting framework is capable of learning time-dependencies for not only sharing missing observations but also handling resource limitations. It gives rise to different communication patterns among agents, which still perform equivalently well as current multiagent actor-critic methods under fully observable settings.

Cite

Text

Wang et al. "R-MADDPG for Partially Observable Environments and Limited Communication." ICML 2019 Workshops: RL4RealLife, 2019.

Markdown

[Wang et al. "R-MADDPG for Partially Observable Environments and Limited Communication." ICML 2019 Workshops: RL4RealLife, 2019.](https://mlanthology.org/icmlw/2019/wang2019icmlw-rmaddpg/)

BibTeX

@inproceedings{wang2019icmlw-rmaddpg,
  title     = {{R-MADDPG for Partially Observable Environments and Limited Communication}},
  author    = {Wang, Rose E. and Everett, Michael and How, Jonathan P.},
  booktitle = {ICML 2019 Workshops: RL4RealLife},
  year      = {2019},
  url       = {https://mlanthology.org/icmlw/2019/wang2019icmlw-rmaddpg/}
}