Finding Safe Zones of Markov Decision Processes Policies
Abstract
Given a policy of a Markov Decision Process, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of a SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SafeZones are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SafeZones, and show that in general, the problem is computationally hard. For this reason, we concentrate on finding approximate SafeZones. Our main result is a bi-criteria approximation learning algorithm with a factor of almost $2$ approximation for both the escape probability and \newprob size, using a polynomial size sample complexity.
Cite
Text
Cohen et al. "Finding Safe Zones of Markov Decision Processes Policies." Neural Information Processing Systems, 2023.Markdown
[Cohen et al. "Finding Safe Zones of Markov Decision Processes Policies." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/cohen2023neurips-finding/)BibTeX
@inproceedings{cohen2023neurips-finding,
title = {{Finding Safe Zones of Markov Decision Processes Policies}},
author = {Cohen, Lee and Mansour, Yishay and Moshkovitz, Michal},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/cohen2023neurips-finding/}
}