Scalable Coordinated Exploration in Concurrent Reinforcement Learning
Abstract
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on the seed sampling concept introduced in Dimakopoulou and Van Roy (2018) and on a randomized value function learning algorithm from Osband et al. (2016). We demonstrate that, for simple tabular contexts, the approach is competitive with those previously proposed in Dimakopoulou and Van Roy (2018) and with a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.
Cite
Text
Dimakopoulou et al. "Scalable Coordinated Exploration in Concurrent Reinforcement Learning." Neural Information Processing Systems, 2018.Markdown
[Dimakopoulou et al. "Scalable Coordinated Exploration in Concurrent Reinforcement Learning." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/dimakopoulou2018neurips-scalable/)BibTeX
@inproceedings{dimakopoulou2018neurips-scalable,
title = {{Scalable Coordinated Exploration in Concurrent Reinforcement Learning}},
author = {Dimakopoulou, Maria and Osband, Ian and Van Roy, Benjamin},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {4219-4227},
url = {https://mlanthology.org/neurips/2018/dimakopoulou2018neurips-scalable/}
}