Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

Abstract

Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.

Cite

Text

Korshunova et al. "Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay." NeurIPS 2021 Workshops: ICBINB, 2021.

Markdown

[Korshunova et al. "Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay." NeurIPS 2021 Workshops: ICBINB, 2021.](https://mlanthology.org/neuripsw/2021/korshunova2021neuripsw-return-a/)

BibTeX

@inproceedings{korshunova2021neuripsw-return-a,
  title     = {{Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay}},
  author    = {Korshunova, Iryna and Jiang, Minqi and Parker-Holder, Jack and Rocktäschel, Tim and Grefenstette, Edward},
  booktitle = {NeurIPS 2021 Workshops: ICBINB},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/korshunova2021neuripsw-return-a/}
}