Robust Online Optimization of Reward-Uncertain MDPs

Regan, Kevin; Boutilier, Craig

doi:10.5591/978-1-57735-516-8/IJCAI11-361

Robust Online Optimization of Reward-Uncertain MDPs

Kevin Regan, Craig Boutilier

IJCAI 2011 pp. 2165-2171

doi:10.5591/978-1-57735-516-8/IJCAI11-361 /ijcai/2011/regan2011ijcai-robust/

Abstract

Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the number of nondominated policies is generally so large as to undermine this leverage. In this paper, we show how the quality of the approximation can be improved online by pruning/adding nondominated policies during reward elicitation, while maintaining computational tractability. Drawing insights from the POMDP literature, we also develop a new anytime algorithm for constructing the set of nondominated policies with provable (anytime) error bounds. These bounds can be exploited to great effect in our online approximation scheme.

PDF IJCAI Semantic Scholar

Cite

Text

Regan and Boutilier. "Robust Online Optimization of Reward-Uncertain MDPs." International Joint Conference on Artificial Intelligence, 2011. doi:10.5591/978-1-57735-516-8/IJCAI11-361

Markdown

[Regan and Boutilier. "Robust Online Optimization of Reward-Uncertain MDPs." International Joint Conference on Artificial Intelligence, 2011.](https://mlanthology.org/ijcai/2011/regan2011ijcai-robust/) doi:10.5591/978-1-57735-516-8/IJCAI11-361

BibTeX

@inproceedings{regan2011ijcai-robust,
  title     = {{Robust Online Optimization of Reward-Uncertain MDPs}},
  author    = {Regan, Kevin and Boutilier, Craig},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2011},
  pages     = {2165-2171},
  doi       = {10.5591/978-1-57735-516-8/IJCAI11-361},
  url       = {https://mlanthology.org/ijcai/2011/regan2011ijcai-robust/}
}