On Local Rewards and Scaling Distributed Reinforcement Learning
Abstract
We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.
Cite
Text
Bagnell and Ng. "On Local Rewards and Scaling Distributed Reinforcement Learning." Neural Information Processing Systems, 2005.Markdown
[Bagnell and Ng. "On Local Rewards and Scaling Distributed Reinforcement Learning." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/bagnell2005neurips-local/)BibTeX
@inproceedings{bagnell2005neurips-local,
title = {{On Local Rewards and Scaling Distributed Reinforcement Learning}},
author = {Bagnell, Drew and Ng, Andrew Y.},
booktitle = {Neural Information Processing Systems},
year = {2005},
pages = {91-98},
url = {https://mlanthology.org/neurips/2005/bagnell2005neurips-local/}
}