Latent Geodesics of Model Dynamics for Offline Reinforcement Learning

Abstract

Model-based offline reinforcement learning approaches generally rely on bounds of model error. While contemporary methods achieve such bounds through an ensemble of models, we propose to estimate them using a data-driven latent metric. Particularly, we build upon recent advances in Riemannian geometry of generative models to construct a latent metric of an encoder-decoder based forward model. Our proposed metric measures both the quality of out of distribution samples as well as the discrepancy of examples in the data. We show that our metric can be viewed as a combination of two metrics, one relating to proximity and the other to epistemic uncertainty. Finally, we leverage our metric in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline reinforcement learning benchmarks.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Tennenholtz et al. "Latent Geodesics of Model Dynamics for Offline Reinforcement Learning." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Tennenholtz et al. "Latent Geodesics of Model Dynamics for Offline Reinforcement Learning." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/tennenholtz2021neuripsw-latent/)

BibTeX

@inproceedings{tennenholtz2021neuripsw-latent,
  title     = {{Latent Geodesics of Model Dynamics for Offline Reinforcement Learning}},
  author    = {Tennenholtz, Guy and Baram, Nir and Mannor, Shie},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/tennenholtz2021neuripsw-latent/}
}