Multi-Agent Advisor Q-Learning

Abstract

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.

Cite

Text

Subramanian et al. "Multi-Agent Advisor Q-Learning." Journal of Artificial Intelligence Research, 2022. doi:10.1613/JAIR.1.13445

Markdown

[Subramanian et al. "Multi-Agent Advisor Q-Learning." Journal of Artificial Intelligence Research, 2022.](https://mlanthology.org/jair/2022/subramanian2022jair-multiagent/) doi:10.1613/JAIR.1.13445

BibTeX

@article{subramanian2022jair-multiagent,
  title     = {{Multi-Agent Advisor Q-Learning}},
  author    = {Subramanian, Sriram Ganapathi and Taylor, Matthew E. and Larson, Kate and Crowley, Mark},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2022},
  pages     = {1-74},
  doi       = {10.1613/JAIR.1.13445},
  volume    = {74},
  url       = {https://mlanthology.org/jair/2022/subramanian2022jair-multiagent/}
}