Towards Robust and Safe Reinforcement Learning with Benign Off-Policy Data

Abstract

Previous work demonstrates that the optimal safe reinforcement learning policy in a noise-free environment is vulnerable and could be unsafe under observational attacks. While adversarial training effectively improves robustness and safety, collecting samples by attacking the behavior agent online could be expensive or prohibitively dangerous in many applications. We propose the robuSt vAriational ofF-policy lEaRning (SAFER) approach, which only requires benign training data without attacking the agent. SAFER obtains an optimal non-parametric variational policy distribution via convex optimization and then uses it to improve the parameterized policy robustly via supervised learning. The two-stage policy optimization facilitates robust training, and extensive experiments on multiple robot platforms show the efficiency of SAFER in learning a robust and safe policy: achieving the same reward with much fewer constraint violations during training than on-policy baselines.

Cite

Text

Liu et al. "Towards Robust and Safe Reinforcement Learning with Benign Off-Policy Data." International Conference on Machine Learning, 2023.

Markdown

[Liu et al. "Towards Robust and Safe Reinforcement Learning with Benign Off-Policy Data." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/liu2023icml-robust/)

BibTeX

@inproceedings{liu2023icml-robust,
  title     = {{Towards Robust and Safe Reinforcement Learning with Benign Off-Policy Data}},
  author    = {Liu, Zuxin and Guo, Zijian and Cen, Zhepeng and Zhang, Huan and Yao, Yihang and Hu, Hanjiang and Zhao, Ding},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {21586-21610},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/liu2023icml-robust/}
}