An Empirical Investigation of Mutual Information Skill Learning
Abstract
Unsupervised skill learning methods are a form of unsupervised pre-training for reinforcement learning (RL) that has the potential to improve the sample efficiency of solving downstream tasks. Prior work has proposed several methods for unsupervised skill discovery based on mutual information (MI) objectives, with different methods varying in how this mutual information is estimated and optimized. This paper studies how different design decisions in skill learning algorithms affect the sample efficiency of solving downstream tasks. Our key findings are that the sample efficiency of downstream adaptation under off-policy backbones is better than their on-policy counterparts. In contrast, on-policy backbones result in better state coverage, moreover, regularizing the discriminator gives better downstream results, and careful choice of the mutual information lower bound and the discriminator architecture yields significant improvements in downstream returns, also, we show empirically that the learned representations during the pre-training step correspond to the controllable aspects of the environment.
Cite
Text
Mohamed et al. "An Empirical Investigation of Mutual Information Skill Learning." ICLR 2022 Workshops: ALOE, 2022.Markdown
[Mohamed et al. "An Empirical Investigation of Mutual Information Skill Learning." ICLR 2022 Workshops: ALOE, 2022.](https://mlanthology.org/iclrw/2022/mohamed2022iclrw-empirical/)BibTeX
@inproceedings{mohamed2022iclrw-empirical,
title = {{An Empirical Investigation of Mutual Information Skill Learning}},
author = {Mohamed, Faisal and Eysenbach, Benjamin and Salakhutdinov, Russ},
booktitle = {ICLR 2022 Workshops: ALOE},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/mohamed2022iclrw-empirical/}
}