Distributionally Robust Markov Decision Processes
Abstract
We consider Markov decision processes where the values of the parameters are uncertain. This uncertainty is described by a sequence of nested sets (that is, each set contains the previous one), each of which corresponds to a probabilistic guarantee for a different confidence level so that a set of admissible probability distributions of the unknown parameters is specified. This formulation models the case where the decision maker is aware of and wants to exploit some (yet imprecise) a-priori information of the distribution of parameters, and arises naturally in practice where methods to estimate the confidence region of parameters abound. We propose a decision criterion based on *distributional robustness*: the optimal policy maximizes the expected total reward under the most adversarial probability distribution over realizations of the uncertain parameters that is admissible (i.e., it agrees with the a-priori information). We show that finding the optimal distributionally robust policy can be reduced to a standard robust MDP where the parameters belong to a single uncertainty set, hence it can be computed in polynomial time under mild technical conditions.
Cite
Text
Xu and Mannor. "Distributionally Robust Markov Decision Processes." Neural Information Processing Systems, 2010.Markdown
[Xu and Mannor. "Distributionally Robust Markov Decision Processes." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/xu2010neurips-distributionally/)BibTeX
@inproceedings{xu2010neurips-distributionally,
title = {{Distributionally Robust Markov Decision Processes}},
author = {Xu, Huan and Mannor, Shie},
booktitle = {Neural Information Processing Systems},
year = {2010},
pages = {2505-2513},
url = {https://mlanthology.org/neurips/2010/xu2010neurips-distributionally/}
}