Robust Reinforcement Learning for Continuous Control with Model Misspecification

Abstract

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case, entropy-regularized, expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a challenging, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework; including an adaptation to another continuous control RL algorithm. Performance videos can be found online at https://sites.google.com/view/robust-rl.

Cite

Text

Mankowitz et al. "Robust Reinforcement Learning for Continuous Control with Model Misspecification." International Conference on Learning Representations, 2020.

Markdown

[Mankowitz et al. "Robust Reinforcement Learning for Continuous Control with Model Misspecification." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/mankowitz2020iclr-robust/)

BibTeX

@inproceedings{mankowitz2020iclr-robust,
  title     = {{Robust Reinforcement Learning for Continuous Control with Model Misspecification}},
  author    = {Mankowitz, Daniel J. and Levine, Nir and Jeong, Rae and Shi, Yuanyuan and Kay, Jackie and Abdolmaleki, Abbas and Springenberg, Jost Tobias and Mann, Timothy and Hester, Todd and Riedmiller, Martin},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/mankowitz2020iclr-robust/}
}