Robustness and Risk-Sensitivity in Markov Decision Processes
Abstract
We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function.
Cite
Text
Osogami. "Robustness and Risk-Sensitivity in Markov Decision Processes." Neural Information Processing Systems, 2012.Markdown
[Osogami. "Robustness and Risk-Sensitivity in Markov Decision Processes." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/osogami2012neurips-robustness/)BibTeX
@inproceedings{osogami2012neurips-robustness,
title = {{Robustness and Risk-Sensitivity in Markov Decision Processes}},
author = {Osogami, Takayuki},
booktitle = {Neural Information Processing Systems},
year = {2012},
pages = {233-241},
url = {https://mlanthology.org/neurips/2012/osogami2012neurips-robustness/}
}