Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Abstract

Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an \emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal \emph{model-value inconsistency} or \emph{self-inconsistency} for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.

Cite

Text

Filos et al. "Model-Value Inconsistency as a Signal for Epistemic Uncertainty." ICLR 2022 Workshops: ALOE, 2022.

Markdown

[Filos et al. "Model-Value Inconsistency as a Signal for Epistemic Uncertainty." ICLR 2022 Workshops: ALOE, 2022.](https://mlanthology.org/iclrw/2022/filos2022iclrw-modelvalue/)

BibTeX

@inproceedings{filos2022iclrw-modelvalue,
  title     = {{Model-Value Inconsistency as a Signal for Epistemic Uncertainty}},
  author    = {Filos, Angelos and Vértes, Eszter and Marinho, Zita and Farquhar, Gregory and Borsa, Diana L and Friesen, Abram L. and Behbahani, Feryal and Schaul, Tom and Barreto, Andre and Osindero, Simon},
  booktitle = {ICLR 2022 Workshops: ALOE},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/filos2022iclrw-modelvalue/}
}