Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration

Abstract

Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.

Cite

Text

Delage and Mannor. "Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration." International Conference on Machine Learning, 2007. doi:10.1145/1273496.1273525

Markdown

[Delage and Mannor. "Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration." International Conference on Machine Learning, 2007.](https://mlanthology.org/icml/2007/delage2007icml-percentile/) doi:10.1145/1273496.1273525

BibTeX

@inproceedings{delage2007icml-percentile,
  title     = {{Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration}},
  author    = {Delage, Erick and Mannor, Shie},
  booktitle = {International Conference on Machine Learning},
  year      = {2007},
  pages     = {225-232},
  doi       = {10.1145/1273496.1273525},
  url       = {https://mlanthology.org/icml/2007/delage2007icml-percentile/}
}