Reinforcement Learning with Bounded Risk
Abstract
In this paper, we consider nite MDPs with fatal states. We dene the risk under a policy as the probability of entering a fatal state, which is dierent to the notion of risk normally used in DP and RL (most often regarding the variance of the return). We consider the problem of nding optimal policies with bounded risk, i.e. where the risk is smaller than some user specied threshold !, and formalize it as a constrained MDP with two innite horizon criteria { a discounted one for the value of a state and an undiscounted criterion for the risk. We dene a heuristic, model free reinforcement learning algorithm that nds good deterministic policies for the constrained problem. The algorithm is based on an abstract ordering of the multi-dimensional return space. It uses a weighted formulation of the problem. The internal weight parameter is adjusted by an heuristic optimization algorithm. 1.
Cite
Text
Geibel. "Reinforcement Learning with Bounded Risk." International Conference on Machine Learning, 2001.Markdown
[Geibel. "Reinforcement Learning with Bounded Risk." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/geibel2001icml-reinforcement/)BibTeX
@inproceedings{geibel2001icml-reinforcement,
title = {{Reinforcement Learning with Bounded Risk}},
author = {Geibel, Peter},
booktitle = {International Conference on Machine Learning},
year = {2001},
pages = {162-169},
url = {https://mlanthology.org/icml/2001/geibel2001icml-reinforcement/}
}