Robustness and Risk-Sensitivity in Markov Decision Processes

Abstract

We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function.

Cite

Text

Osogami. "Robustness and Risk-Sensitivity in Markov Decision Processes." Neural Information Processing Systems, 2012.

Markdown

[Osogami. "Robustness and Risk-Sensitivity in Markov Decision Processes." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/osogami2012neurips-robustness/)

BibTeX

@inproceedings{osogami2012neurips-robustness,
  title     = {{Robustness and Risk-Sensitivity in Markov Decision Processes}},
  author    = {Osogami, Takayuki},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {233-241},
  url       = {https://mlanthology.org/neurips/2012/osogami2012neurips-robustness/}
}