Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments
Abstract
In recent years, the area of Unsupervised Reinforcement Learning (URL) has gained particular relevance as a way to foster generalization of reinforcement learning agents. In this setting, the agent's policy is first pre-trained in an unknown environment via reward-free interactions, often through a pure exploration objective that drives the agent towards a uniform coverage of the state space. It has been shown that this pre-training leads to improved efficiency in downstream supervised tasks later given to the agent to solve. When dealing with the unsupervised pre-training in multiple environments one should also account for potential trade-offs in the exploration performance within the set of environments, which leads to the following question: Can we pre-train a policy that is simultaneously optimal in all the environments? In this work, we address this question by proposing a novel non-Markovian policy architecture to be pre-trained with the common maximum state entropy objective. This architecture showcases significant empirical advantages when compared to state-of-the-art Markovian agents for URL.
Cite
Text
Maldini et al. "Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments." ICML 2022 Workshops: Pre-Training, 2022.Markdown
[Maldini et al. "Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments." ICML 2022 Workshops: Pre-Training, 2022.](https://mlanthology.org/icmlw/2022/maldini2022icmlw-nonmarkovian-a/)BibTeX
@inproceedings{maldini2022icmlw-nonmarkovian-a,
title = {{Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments}},
author = {Maldini, Pietro and Mutti, Mirco and De Santi, Riccardo and Restelli, Marcello},
booktitle = {ICML 2022 Workshops: Pre-Training},
year = {2022},
url = {https://mlanthology.org/icmlw/2022/maldini2022icmlw-nonmarkovian-a/}
}