Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Abstract

In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.

Cite

Text

Zhao et al. "Maximum Entropy-Regularized Multi-Goal Reinforcement Learning." International Conference on Machine Learning, 2019.

Markdown

[Zhao et al. "Maximum Entropy-Regularized Multi-Goal Reinforcement Learning." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/zhao2019icml-maximum/)

BibTeX

@inproceedings{zhao2019icml-maximum,
  title     = {{Maximum Entropy-Regularized Multi-Goal Reinforcement Learning}},
  author    = {Zhao, Rui and Sun, Xudong and Tresp, Volker},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {7553-7562},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/zhao2019icml-maximum/}
}