On Local Rewards and Scaling Distributed Reinforcement Learning

Abstract

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

Cite

Text

Bagnell and Ng. "On Local Rewards and Scaling Distributed Reinforcement Learning." Neural Information Processing Systems, 2005.

Markdown

[Bagnell and Ng. "On Local Rewards and Scaling Distributed Reinforcement Learning." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/bagnell2005neurips-local/)

BibTeX

@inproceedings{bagnell2005neurips-local,
  title     = {{On Local Rewards and Scaling Distributed Reinforcement Learning}},
  author    = {Bagnell, Drew and Ng, Andrew Y.},
  booktitle = {Neural Information Processing Systems},
  year      = {2005},
  pages     = {91-98},
  url       = {https://mlanthology.org/neurips/2005/bagnell2005neurips-local/}
}