Provably Efficient RL with Rich Observations via Latent State Decoding

Abstract

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps—where previously decoded latent states provide labels for later regression problems—and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with naïve exploration, even when $Q$-learning has cheating access to latent states.

Cite

Text

Du et al. "Provably Efficient RL with Rich Observations via Latent State Decoding." International Conference on Machine Learning, 2019.

Markdown

[Du et al. "Provably Efficient RL with Rich Observations via Latent State Decoding." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/du2019icml-provably/)

BibTeX

@inproceedings{du2019icml-provably,
  title     = {{Provably Efficient RL with Rich Observations via Latent State Decoding}},
  author    = {Du, Simon and Krishnamurthy, Akshay and Jiang, Nan and Agarwal, Alekh and Dudik, Miroslav and Langford, John},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {1665-1674},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/du2019icml-provably/}
}