Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

Abstract

To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of re- wards. We first show how the set of possible reward functions for a given preference relation can be rep- resented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the pos- sible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upper- bounded by a polynomial in the size of the problem, and we present experimental results which demon- strate that our approach is efficient in practice.

Cite

Text

Weng and Zanuttini. "Interactive Value Iteration for Markov Decision Processes with Unknown Rewards." International Joint Conference on Artificial Intelligence, 2013.

Markdown

[Weng and Zanuttini. "Interactive Value Iteration for Markov Decision Processes with Unknown Rewards." International Joint Conference on Artificial Intelligence, 2013.](https://mlanthology.org/ijcai/2013/weng2013ijcai-interactive/)

BibTeX

@inproceedings{weng2013ijcai-interactive,
  title     = {{Interactive Value Iteration for Markov Decision Processes with Unknown Rewards}},
  author    = {Weng, Paul and Zanuttini, Bruno},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2013},
  pages     = {2415-2421},
  url       = {https://mlanthology.org/ijcai/2013/weng2013ijcai-interactive/}
}