Risk-Sensitive Reinforcement Learning

Mihatsch, Oliver; Neuneier, Ralph

doi:10.1023/A:1017940631555

Risk-Sensitive Reinforcement Learning

Oliver Mihatsch, Ralph Neuneier

MLJ 2002 pp. 267-290

doi:10.1023/A:1017940631555 /mlj/2002/mihatsch2002mlj-risksensitive/

Abstract

Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control , which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm. Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.

PDF MLJ Semantic Scholar

Cite

Text

Mihatsch and Neuneier. "Risk-Sensitive Reinforcement Learning." Machine Learning, 2002. doi:10.1023/A:1017940631555

Markdown

[Mihatsch and Neuneier. "Risk-Sensitive Reinforcement Learning." Machine Learning, 2002.](https://mlanthology.org/mlj/2002/mihatsch2002mlj-risksensitive/) doi:10.1023/A:1017940631555

BibTeX

@article{mihatsch2002mlj-risksensitive,
  title     = {{Risk-Sensitive Reinforcement Learning}},
  author    = {Mihatsch, Oliver and Neuneier, Ralph},
  journal   = {Machine Learning},
  year      = {2002},
  pages     = {267-290},
  doi       = {10.1023/A:1017940631555},
  volume    = {49},
  url       = {https://mlanthology.org/mlj/2002/mihatsch2002mlj-risksensitive/}
}