Safer Reinforcement Learning by Going Off-Policy: A Benchmark
Abstract
Avoiding penalizing safety constraints while learning solvable tasks is the main concern of Safe Reinforcement Learning (SafeRL). Most prior studies focus on solving SafeRL problems with the on-policy algorithms, which obtain stable results at the expense of sample efficiency. In this paper, we study SafeRL from the off-policy perspective. We argue that off-policy RL algorithms are better suited for SafeRL as minimizing the number of samples results in fewer safety penalties. We show that off-policy algorithms achieve better safety metrics for the same performance level than on-policy competitors and provide a benchmark of 6 modern off-policy algorithms tested on 30 environments from the state-of-the-art SafetyGymnasium environment set.
Cite
Text
Kuznetsov. "Safer Reinforcement Learning by Going Off-Policy: A Benchmark." ICML 2024 Workshops: NextGenAISafety, 2024.Markdown
[Kuznetsov. "Safer Reinforcement Learning by Going Off-Policy: A Benchmark." ICML 2024 Workshops: NextGenAISafety, 2024.](https://mlanthology.org/icmlw/2024/kuznetsov2024icmlw-safer/)BibTeX
@inproceedings{kuznetsov2024icmlw-safer,
title = {{Safer Reinforcement Learning by Going Off-Policy: A Benchmark}},
author = {Kuznetsov, Igor},
booktitle = {ICML 2024 Workshops: NextGenAISafety},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/kuznetsov2024icmlw-safer/}
}