SPoRt - Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL
Abstract
To apply reinforcement learning to safety-critical applications, we ought to provide safety guarantees during both policy training and deployment. In this work we present novel theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup: the bound, based on a 'maximum policy ratio' that is computed with respect to a 'safe' base policy, can also be more generally applied to temporally-extended properties (beyond safety) and to robust control problems. We thus present SPoRt, which also provides a data-driven approach for obtaining such a bound for the base policy, based on scenario theory, and which includes Projected PPO, a new projection-based approach for training the task-specific policy while maintaining a user-specified bound on property violation. Hence, SPoRt enables the user to trade off safety guarantees in exchange for task-specific performance. Accordingly, we present experimental results demonstrating this trade-off, as well as a comparison of the theoretical bound to posterior bounds based on empirical violation rates.
Cite
Text
Cloete et al. "SPoRt - Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/554Markdown
[Cloete et al. "SPoRt - Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/cloete2025ijcai-sport/) doi:10.24963/IJCAI.2025/554BibTeX
@inproceedings{cloete2025ijcai-sport,
title = {{SPoRt - Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL}},
author = {Cloete, Jacques and Vertovec, Nikolaus and Abate, Alessandro},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {4976-4984},
doi = {10.24963/IJCAI.2025/554},
url = {https://mlanthology.org/ijcai/2025/cloete2025ijcai-sport/}
}