Maximum Entropy Model-Based Reinforcement Learning

Abstract

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to a practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offer a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exist for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of model-based algorithm Dreamer.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Svidchenko and Shpilman. "Maximum Entropy Model-Based Reinforcement Learning." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Svidchenko and Shpilman. "Maximum Entropy Model-Based Reinforcement Learning." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/svidchenko2021neuripsw-maximum/)

BibTeX

@inproceedings{svidchenko2021neuripsw-maximum,
  title     = {{Maximum Entropy Model-Based Reinforcement Learning}},
  author    = {Svidchenko, Oleg and Shpilman, Aleksei},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/svidchenko2021neuripsw-maximum/}
}