State Entropy Maximization with Random Encoders for Efficient Exploration

Abstract

Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Encoders for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward. In order to estimate state entropy in environments with high-dimensional observations, we utilize a k-nearest neighbor entropy estimator in the low-dimensional representation space of a convolutional encoder. In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly initialized encoder, which is fixed throughout training. Our experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks. We also show that RE3 allows learning diverse behaviors without extrinsic rewards, effectively improving sample-efficiency in downstream tasks.

Cite

Text

Seo et al. "State Entropy Maximization with Random Encoders for Efficient Exploration." International Conference on Machine Learning, 2021.

Markdown

[Seo et al. "State Entropy Maximization with Random Encoders for Efficient Exploration." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/seo2021icml-state/)

BibTeX

@inproceedings{seo2021icml-state,
  title     = {{State Entropy Maximization with Random Encoders for Efficient Exploration}},
  author    = {Seo, Younggyo and Chen, Lili and Shin, Jinwoo and Lee, Honglak and Abbeel, Pieter and Lee, Kimin},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {9443-9454},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/seo2021icml-state/}
}