Consideration of Risk in Reinforcement Learning

Abstract

Most Reinforcement Learning (RL) work supposes policies for sequential decision tasks to be optimal that minimize the expected total discounted cost (e.g. Q-Learning; AHC architecture). On the other hand, it is well known that it is not always reliable and can be treacherous to use the expected value as a decision criterion. A lot of alternative decision criteria have been suggested in decision theory to get a more sophisticated consideration of risk but most RL researchers have not concerned themselves with this subject until now. The purpose of this paper is to draw the reader's attention to the problems of the expected value criterion in Markov decision processes and to give Dynamic Programming algorithms for an alternative criterion, namely the minimax criterion. A counterpart to Watkins' Q-Learning with regard to the minimax criterion is presented. The new algorithm, called Qˆ-learning, finds policies that minimize the worst-case total discounted cost.

Cite

Text

Heger. "Consideration of Risk in Reinforcement Learning." International Conference on Machine Learning, 1994. doi:10.1016/B978-1-55860-335-6.50021-0

Markdown

[Heger. "Consideration of Risk in Reinforcement Learning." International Conference on Machine Learning, 1994.](https://mlanthology.org/icml/1994/heger1994icml-consideration/) doi:10.1016/B978-1-55860-335-6.50021-0

BibTeX

@inproceedings{heger1994icml-consideration,
  title     = {{Consideration of Risk in Reinforcement Learning}},
  author    = {Heger, Matthias},
  booktitle = {International Conference on Machine Learning},
  year      = {1994},
  pages     = {105-111},
  doi       = {10.1016/B978-1-55860-335-6.50021-0},
  url       = {https://mlanthology.org/icml/1994/heger1994icml-consideration/}
}