Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning

Abstract

Prioritizing the states and actions from which policy improvement is performed can improve the sample efficiency of model-based reinforcement learning systems. Although much is already known about prioritizing planning updates, more needs to be understood to operationalize these ideas in complex settings that involve non-stationary and stochastic transition dynamics, large numbers of states, and scalable function approximation architectures. Our paper presents an online meta-learning algorithm to address these needs. The algorithm finds distributions that encode priority in their probability mass. The paper evaluates the algorithm in a domain with a changing goal and with a fixed, generative transition model. Results show that prioritizing planning updates from samples of the meta-learned distribution significantly improves sample efficiency over fixed baseline distributions. Additionally, they point to a number of interesting opportunities for future research.

Cite

Text

Burega et al. "Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning." NeurIPS 2022 Workshops: MetaLearn, 2022.

Markdown

[Burega et al. "Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning." NeurIPS 2022 Workshops: MetaLearn, 2022.](https://mlanthology.org/neuripsw/2022/burega2022neuripsw-learning/)

BibTeX

@inproceedings{burega2022neuripsw-learning,
  title     = {{Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning}},
  author    = {Burega, Bradley and Martin, John D and Bowling, Michael},
  booktitle = {NeurIPS 2022 Workshops: MetaLearn},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/burega2022neuripsw-learning/}
}