I2Q: A Fully Decentralized Q-Learning Algorithm
Abstract
Fully decentralized multi-agent reinforcement learning has shown great potentials for many real-world cooperative tasks, where the global information, \textit{e.g.}, the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.
Cite
Text
Jiang and Lu. "I2Q: A Fully Decentralized Q-Learning Algorithm." Neural Information Processing Systems, 2022.Markdown
[Jiang and Lu. "I2Q: A Fully Decentralized Q-Learning Algorithm." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/jiang2022neurips-i2q/)BibTeX
@inproceedings{jiang2022neurips-i2q,
title = {{I2Q: A Fully Decentralized Q-Learning Algorithm}},
author = {Jiang, Jiechuan and Lu, Zongqing},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/jiang2022neurips-i2q/}
}