DeepMDP: Learning Continuous Latent Space Models for Representation Learning

Abstract

Many reinforcement learning (RL) tasks provide the agent with high-dimensional observations that can be simplified into low-dimensional continuous states. To formalize this process, we introduce the concept of a \texit{DeepMDP}, a parameterized latent space model that is trained via the minimization of two tractable latent space losses: prediction of rewards and prediction of the distribution over next latent states. We show that the optimization of these objectives guarantees (1) the quality of the embedding function as a representation of the state space and (2) the quality of the DeepMDP as a model of the environment. Our theoretical findings are substantiated by the experimental result that a trained DeepMDP recovers the latent structure underlying high-dimensional observations on a synthetic environment. Finally, we show that learning a DeepMDP as an auxiliary task in the Atari 2600 domain leads to large performance improvements over model-free RL.

Cite

Text

Gelada et al. "DeepMDP: Learning Continuous Latent Space Models for Representation Learning." International Conference on Machine Learning, 2019.

Markdown

[Gelada et al. "DeepMDP: Learning Continuous Latent Space Models for Representation Learning." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/gelada2019icml-deepmdp/)

BibTeX

@inproceedings{gelada2019icml-deepmdp,
  title     = {{DeepMDP: Learning Continuous Latent Space Models for Representation Learning}},
  author    = {Gelada, Carles and Kumar, Saurabh and Buckman, Jacob and Nachum, Ofir and Bellemare, Marc G.},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {2170-2179},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/gelada2019icml-deepmdp/}
}