Entropic Desired Dynamics for Intrinsic Control

Abstract

An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach. In practice, this often means maximizing the number of latent codes that can be discriminated from future states under some short time horizon (e.g. \cite{eysenbach2018diversity}). By situating these latent codes in a globally consistent coordinate system, we show that agents can reliably reach more states in the long term while still optimizing a local objective. A simple instantiation of this idea, \textbf{E}ntropic \textbf{D}esired \textbf{D}ynamics for \textbf{I}ntrinsic \textbf{C}on\textbf{T}rol (EDDICT), assumes fixed additive latent dynamics, which results in tractable learning and an interpretable latent space. Compared to prior methods, EDDICT's globally consistent codes allow it to be far more exploratory, as demonstrated by improved state coverage and increased unsupervised performance on hard exploration games such as Montezuma's Revenge.

Cite

Text

Hansen et al. "Entropic Desired Dynamics for Intrinsic Control." Neural Information Processing Systems, 2021.

Markdown

[Hansen et al. "Entropic Desired Dynamics for Intrinsic Control." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/hansen2021neurips-entropic/)

BibTeX

@inproceedings{hansen2021neurips-entropic,
  title     = {{Entropic Desired Dynamics for Intrinsic Control}},
  author    = {Hansen, Steven and Desjardins, Guillaume and Baumli, Kate and Warde-Farley, David and Heess, Nicolas and Osindero, Simon and Mnih, Volodymyr},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/hansen2021neurips-entropic/}
}