Reinforcement Learning Under a Multi-Agent Predictive State Representation Model: Method and Theory
Abstract
We study reinforcement learning for partially observable multi-agent systems where each agent only has access to its own observation and reward and aims to maximize its cumulative rewards. To handle partial observations, we propose graph-assisted predictive state representations (GAPSR), a scalable multi-agent representation learning framework that leverages the agent connectivity graphs to aggregate local representations computed by each agent. In addition, our representations are readily able to incorporate dynamic interaction graphs and kernel space embeddings of the predictive states, and thus have strong flexibility and representation power. Based on GAPSR, we propose an end-to-end MARL algorithm that simultaneously infers the predictive representations and uses the representations as the input of a policy optimization algorithm. Empirically, we demonstrate the efficacy of the proposed algorithm provided on both a MAMuJoCo robotic learning experiment and a multi-agent particle learning environment.
Cite
Text
Zhang et al. "Reinforcement Learning Under a Multi-Agent Predictive State Representation Model: Method and Theory." International Conference on Learning Representations, 2022.Markdown
[Zhang et al. "Reinforcement Learning Under a Multi-Agent Predictive State Representation Model: Method and Theory." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/zhang2022iclr-reinforcement/)BibTeX
@inproceedings{zhang2022iclr-reinforcement,
title = {{Reinforcement Learning Under a Multi-Agent Predictive State Representation Model: Method and Theory}},
author = {Zhang, Zhi and Yang, Zhuoran and Liu, Han and Tokekar, Pratap and Huang, Furong},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/zhang2022iclr-reinforcement/}
}