CMIX: Deep Multi-Agent Reinforcement Learning with Peak and Average Constraints

Abstract

In many real-world tasks, a team of learning agents must ensure that their optimized policies collectively satisfy required peak and average constraints, while acting in a decentralized manner. In this paper, we consider the problem of multi-agent reinforcement learning for a constrained, partially observable Markov decision process – where the agents need to maximize a global reward function subject to both peak and average constraints. We propose a novel algorithm, CMIX, to enable centralized training and decentralized execution (CTDE) under those constraints. In particular, CMIX amends the reward function to take peak constraint violations into account and then transforms the resulting problem under average constraints to a max-min optimization problem. We leverage the value function factorization method to develop a CTDE algorithm for solving the max-min optimization problem, and two gap loss functions are proposed to eliminate the bias of learned solutions. We evaluate our CMIX algorithm on a blocker game with travel cost and a large-scale vehicular network routing problem. The results show that CMIX outperforms existing algorithms including IQL, VDN, and QMIX, in that it optimizes the global reward objective while satisfying both peak and average constraints. To the best of our knowledge, this is the first proposal of a CTDE learning algorithm subject to both peak and average constraints.

Cite

Text

Liu et al. "CMIX: Deep Multi-Agent Reinforcement Learning with Peak and Average Constraints." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86486-6_10

Markdown

[Liu et al. "CMIX: Deep Multi-Agent Reinforcement Learning with Peak and Average Constraints." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/liu2021ecmlpkdd-cmix/) doi:10.1007/978-3-030-86486-6_10

BibTeX

@inproceedings{liu2021ecmlpkdd-cmix,
  title     = {{CMIX: Deep Multi-Agent Reinforcement Learning with Peak and Average Constraints}},
  author    = {Liu, Chenyi and Geng, Nan and Aggarwal, Vaneet and Lan, Tian and Yang, Yuan and Xu, Mingwei},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {157-173},
  doi       = {10.1007/978-3-030-86486-6_10},
  url       = {https://mlanthology.org/ecmlpkdd/2021/liu2021ecmlpkdd-cmix/}
}