Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments

Abstract

In recent years, the area of Unsupervised Reinforcement Learning (URL) has gained particular relevance as a way to foster generalization of reinforcement learning agents. In this setting, the agent's policy is first pre-trained in an unknown environment via reward-free interactions, often through a pure exploration objective that drives the agent towards a uniform coverage of the state space. It has been shown that this pre-training leads to improved efficiency in downstream supervised tasks later given to the agent to solve. When dealing with the unsupervised pre-training in multiple environments one should also account for potential trade-offs in the exploration performance within the set of environments, which leads to the following question: Can we pre-train a policy that is simultaneously optimal in all the environments? In this work, we address this question by proposing a novel non-Markovian policy architecture to be pre-trained with the common maximum state entropy objective. This architecture showcases significant empirical advantages when compared to state-of-the-art Markovian agents for URL.

Cite

Text

Maldini et al. "Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments." ICML 2022 Workshops: DARL, 2022.

Markdown

[Maldini et al. "Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments." ICML 2022 Workshops: DARL, 2022.](https://mlanthology.org/icmlw/2022/maldini2022icmlw-nonmarkovian/)

BibTeX

@inproceedings{maldini2022icmlw-nonmarkovian,
  title     = {{Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments}},
  author    = {Maldini, Pietro and Mutti, Mirco and De Santi, Riccardo and Restelli, Marcello},
  booktitle = {ICML 2022 Workshops: DARL},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/maldini2022icmlw-nonmarkovian/}
}