Safe Policy Learning for Continuous Control

Abstract

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through near-safe policies, i.e., policies that keep the agent in desirable situations, both during training and at convergence. We formulate these problems as {\em constrained} Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while enforcing near-constraint satisfaction for every policy update by projecting either the policy parameter or the selected action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, in practice our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world robot obstacle-avoidance problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction.

Cite

Text

Chow et al. "Safe Policy Learning for Continuous Control." Conference on Robot Learning, 2020.

Markdown

[Chow et al. "Safe Policy Learning for Continuous Control." Conference on Robot Learning, 2020.](https://mlanthology.org/corl/2020/chow2020corl-safe/)

BibTeX

@inproceedings{chow2020corl-safe,
  title     = {{Safe Policy Learning for Continuous Control}},
  author    = {Chow, Yinlam and Nachum, Ofir and Faust, Aleksandra and Dueñez-Guzman, Edgar and Ghavamzadeh, Mohammad},
  booktitle = {Conference on Robot Learning},
  year      = {2020},
  pages     = {801-821},
  volume    = {155},
  url       = {https://mlanthology.org/corl/2020/chow2020corl-safe/}
}