Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures

Abstract

We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.

Cite

Text

Köse and Ruszczyński. "Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures." Journal of Machine Learning Research, 2021.

Markdown

[Köse and Ruszczyński. "Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures." Journal of Machine Learning Research, 2021.](https://mlanthology.org/jmlr/2021/kose2021jmlr-riskaverse/)

BibTeX

@article{kose2021jmlr-riskaverse,
  title     = {{Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures}},
  author    = {Köse, Umit and Ruszczyński, Andrzej},
  journal   = {Journal of Machine Learning Research},
  year      = {2021},
  pages     = {1-34},
  volume    = {22},
  url       = {https://mlanthology.org/jmlr/2021/kose2021jmlr-riskaverse/}
}