Exploration in Model-Based Reinforcement Learning by Empirically Estimating Learning Progress

Abstract

Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the currently learned model without consideration of the empirical prediction error. For example, PAC-MDP approaches such as Rmax base their model certainty on the amount of collected data, while Bayesian approaches assume a prior over the transition dynamics. We propose extensions to such approaches which drive exploration solely based on empirical estimates of the learner's accuracy and learning progress. We provide a ``sanity check'' theoretical analysis, discussing the behavior of our extensions in the standard stationary finite state-action case. We then provide experimental studies demonstrating the robustness of these exploration measures in cases of non-stationary environments or where original approaches are misled by wrong domain assumptions.

Cite

Text

Lopes et al. "Exploration in Model-Based Reinforcement Learning by Empirically Estimating Learning Progress." Neural Information Processing Systems, 2012.

Markdown

[Lopes et al. "Exploration in Model-Based Reinforcement Learning by Empirically Estimating Learning Progress." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/lopes2012neurips-exploration/)

BibTeX

@inproceedings{lopes2012neurips-exploration,
  title     = {{Exploration in Model-Based Reinforcement Learning by Empirically Estimating Learning Progress}},
  author    = {Lopes, Manuel and Lang, Tobias and Toussaint, Marc and Oudeyer, Pierre-yves},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {206-214},
  url       = {https://mlanthology.org/neurips/2012/lopes2012neurips-exploration/}
}