Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar

UAI 2020 pp. 121-130

/uai/2020/mohammadasghari2020uai-regret/

Abstract

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems’ dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one system’s dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(\sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns which appear in vehicle platoon control.

PDF UAI Semantic Scholar

Cite

Text

Mohammad Asghari et al. "Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems." Uncertainty in Artificial Intelligence, 2020.

Markdown

[Mohammad Asghari et al. "Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems." Uncertainty in Artificial Intelligence, 2020.](https://mlanthology.org/uai/2020/mohammadasghari2020uai-regret/)

BibTeX

@inproceedings{mohammadasghari2020uai-regret,
  title     = {{Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems}},
  author    = {Mohammad Asghari, Seyed and Ouyang, Yi and Nayyar, Ashutosh},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2020},
  pages     = {121-130},
  volume    = {124},
  url       = {https://mlanthology.org/uai/2020/mohammadasghari2020uai-regret/}
}