A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs

da Silva, Valdinei Freire; Costa, Anna Helena Reali

doi:10.1007/978-3-642-23780-5_38

A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs

Valdinei Freire da Silva, Anna Helena Reali Costa

ECML-PKDD 2011 pp. 439-454

doi:10.1007/978-3-642-23780-5_38 /ecmlpkdd/2011/dasilva2011ecmlpkdd-geometric/

Abstract

Markov Decision Processes (MDPs) provide a mathematical framework for modelling decision-making of agents acting in stochastic environments, in which transitions probabilities model the environment dynamics and a reward function evaluates the agent’s behaviour. Lately, however, special attention has been brought to the difficulty of modelling precisely the reward function, which has motivated research on MDP with imprecisely specified reward. Some of these works exploit the use of nondominated policies, which are optimal policies for some instantiation of the imprecise reward function. An algorithm that calculates nondominated policies is π Witness, and nondominated policies are used to take decision under the minimax regret evaluation. An interesting matter would be defining a small subset of nondominated policies so that the minimax regret can be calculated faster, but accurately. We modified π Witness to do so. We also present the π Hull algorithm to calculate nondominated policies adopting a geometric approach. Under the assumption that reward functions are linearly defined on a set of features, we show empirically that π Hull can be faster than our modified version of π Witness.

PDF ECML-PKDD Semantic Scholar

Cite

Text

da Silva and Costa. "A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011. doi:10.1007/978-3-642-23780-5_38

Markdown

[da Silva and Costa. "A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.](https://mlanthology.org/ecmlpkdd/2011/dasilva2011ecmlpkdd-geometric/) doi:10.1007/978-3-642-23780-5_38

BibTeX

@inproceedings{dasilva2011ecmlpkdd-geometric,
  title     = {{A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs}},
  author    = {da Silva, Valdinei Freire and Costa, Anna Helena Reali},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2011},
  pages     = {439-454},
  doi       = {10.1007/978-3-642-23780-5_38},
  url       = {https://mlanthology.org/ecmlpkdd/2011/dasilva2011ecmlpkdd-geometric/}
}