Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration
Abstract
Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.
Cite
Text
Delage and Mannor. "Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration." International Conference on Machine Learning, 2007. doi:10.1145/1273496.1273525Markdown
[Delage and Mannor. "Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration." International Conference on Machine Learning, 2007.](https://mlanthology.org/icml/2007/delage2007icml-percentile/) doi:10.1145/1273496.1273525BibTeX
@inproceedings{delage2007icml-percentile,
title = {{Percentile Optimization in Uncertain Markov Decision Processes with Application to Efficient Exploration}},
author = {Delage, Erick and Mannor, Shie},
booktitle = {International Conference on Machine Learning},
year = {2007},
pages = {225-232},
doi = {10.1145/1273496.1273525},
url = {https://mlanthology.org/icml/2007/delage2007icml-percentile/}
}