Multi-Agent Learning with Policy Prediction

Abstract

Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion of convergence than the basic gradient ascent, that is, strategies converge to a Nash equilibrium within a restricted class of iterated games. Motivated by this augmentation, we then propose a new practical multi-agent reinforcement learning (MARL) algorithm exploiting approximate policy prediction. Empirical results show that it converges faster and in a wider variety of situations than state-of-the-art MARL algorithms.

Cite

Text

Zhang and Lesser. "Multi-Agent Learning with Policy Prediction." AAAI Conference on Artificial Intelligence, 2010. doi:10.1609/AAAI.V24I1.7639

Markdown

[Zhang and Lesser. "Multi-Agent Learning with Policy Prediction." AAAI Conference on Artificial Intelligence, 2010.](https://mlanthology.org/aaai/2010/zhang2010aaai-multi-a/) doi:10.1609/AAAI.V24I1.7639

BibTeX

@inproceedings{zhang2010aaai-multi-a,
  title     = {{Multi-Agent Learning with Policy Prediction}},
  author    = {Zhang, Chongjie and Lesser, Victor R.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2010},
  pages     = {927-934},
  doi       = {10.1609/AAAI.V24I1.7639},
  url       = {https://mlanthology.org/aaai/2010/zhang2010aaai-multi-a/}
}