Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game
Abstract
The task of real-time combat game is to coordinate multiple units to defeat their enemies controlled by the given opponent in a real-time combat scenario. It is difficult to design a high-level Artificial Intelligence (AI) program for such a task due to its extremely large state-action space and real-time requirements. This paper formulates this task as a collective decentralized partially observable Markov decision process, and designs a Deep Decentralized Policy Network (DDPN) to model the polices. To train DDPN effectively, a novel two-stage learning algorithm is proposed which combines imitation learning from opponent and reinforcement learning by no-regret dynamics. Extensive experimental results on various combat scenarios indicate that proposed method can defeat different opponent models and significantly outperforms many state-of-the-art approaches.
Cite
Text
Peng et al. "Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/181Markdown
[Peng et al. "Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/peng2019ijcai-learning/) doi:10.24963/IJCAI.2019/181BibTeX
@inproceedings{peng2019ijcai-learning,
title = {{Learning Deep Decentralized Policy Network by Collective Rewards for Real-Time Combat Game}},
author = {Peng, Peixi and Xing, Junliang and Cao, Lili and Mu, Lisen and Huang, Chang},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2019},
pages = {1305-1311},
doi = {10.24963/IJCAI.2019/181},
url = {https://mlanthology.org/ijcai/2019/peng2019ijcai-learning/}
}