Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures
Abstract
We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.
Cite
Text
Köse and Ruszczyński. "Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures." Journal of Machine Learning Research, 2021.Markdown
[Köse and Ruszczyński. "Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures." Journal of Machine Learning Research, 2021.](https://mlanthology.org/jmlr/2021/kose2021jmlr-riskaverse/)BibTeX
@article{kose2021jmlr-riskaverse,
title = {{Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures}},
author = {Köse, Umit and Ruszczyński, Andrzej},
journal = {Journal of Machine Learning Research},
year = {2021},
pages = {1-34},
volume = {22},
url = {https://mlanthology.org/jmlr/2021/kose2021jmlr-riskaverse/}
}