Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning
Abstract
Prioritizing the states and actions from which policy improvement is performed can improve the sample efficiency of model-based reinforcement learning systems. Although much is already known about prioritizing planning updates, more needs to be understood to operationalize these ideas in complex settings that involve non-stationary and stochastic transition dynamics, large numbers of states, and scalable function approximation architectures. Our paper presents an online meta-learning algorithm to address these needs. The algorithm finds distributions that encode priority in their probability mass. The paper evaluates the algorithm in a domain with a changing goal and with a fixed, generative transition model. Results show that prioritizing planning updates from samples of the meta-learned distribution significantly improves sample efficiency over fixed baseline distributions. Additionally, they point to a number of interesting opportunities for future research.
Cite
Text
Burega et al. "Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning." NeurIPS 2022 Workshops: MetaLearn, 2022.Markdown
[Burega et al. "Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning." NeurIPS 2022 Workshops: MetaLearn, 2022.](https://mlanthology.org/neuripsw/2022/burega2022neuripsw-learning/)BibTeX
@inproceedings{burega2022neuripsw-learning,
title = {{Learning to Prioritize Planning Updates in Model-Based Reinforcement Learning}},
author = {Burega, Bradley and Martin, John D and Bowling, Michael},
booktitle = {NeurIPS 2022 Workshops: MetaLearn},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/burega2022neuripsw-learning/}
}