Trust, but Verify: Model-Based Exploration in Sparse Reward Environments
Abstract
We propose $\textit{trust-but-verify}$ (TBV) mechanism, a new method which uses model uncertainty estimates to guide exploration. The mechanism augments graph search planning algorithms by the capacity to deal with learned model's imperfections. We identify certain type of frequent model errors, which we dub $\textit{false loops}$, and which are particularly dangerous for graph search algorithms in discrete environments. These errors impose falsely pessimistic expectations and thus hinder exploration. We confirm this experimentally and show that TBV can effectively alleviate them. TBV combined with MCTS or Best First Search forms an effective model-based reinforcement learning solution, which is able to robustly solve sparse reward problems.
Cite
Text
Czechowski et al. "Trust, but Verify: Model-Based Exploration in Sparse Reward Environments." NeurIPS 2020 Workshops: LMCA, 2020.Markdown
[Czechowski et al. "Trust, but Verify: Model-Based Exploration in Sparse Reward Environments." NeurIPS 2020 Workshops: LMCA, 2020.](https://mlanthology.org/neuripsw/2020/czechowski2020neuripsw-trust/)BibTeX
@inproceedings{czechowski2020neuripsw-trust,
title = {{Trust, but Verify: Model-Based Exploration in Sparse Reward Environments}},
author = {Czechowski, Konrad and Odrzygóźdź, Tomasz and Izworski, Michał and Zbysiński, Marek and Kuciński, Łukasz and Miłoś, Piotr},
booktitle = {NeurIPS 2020 Workshops: LMCA},
year = {2020},
url = {https://mlanthology.org/neuripsw/2020/czechowski2020neuripsw-trust/}
}