Bounding the Suboptimality of Reusing Subproblem
Abstract
We are interested in the problem of determining a course of action to achieve a desired objective in a non-deterministic environment. Markov decision processes (MDPs) provide a framework for repre-senting this action selection problem, and there are a number of algorithms that learn optimal policies within this formulation. This framework has also been used to study state space abstraction, problem decomposition, and policy reuse. These techniques sacrifice optimality of their solution for improved learning speed. In this paper we examine the sub-optimality of reusing policies that are solutions to subproblems. This is done within a restricted class of MDPs, namely those where non-zero reward is received only upon reaching a goal state. We in-troduce the definition of a subproblem within this class and provide motivation for how reuse of sub-problem solutions can speed up learning. The con-tribution of this paper is the derivation of a tight bound on the loss in optimality from this reuse. We examine a bound that is based on Bellman error, which applies to all MDPs, but is not tight enough to be useful. We contribute our own theoretical re-sult that gives an empirically tight bound on this suboptimality. 1
Cite
Text
Bowling and Veloso. "Bounding the Suboptimality of Reusing Subproblem." International Joint Conference on Artificial Intelligence, 1999.Markdown
[Bowling and Veloso. "Bounding the Suboptimality of Reusing Subproblem." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/bowling1999ijcai-bounding/)BibTeX
@inproceedings{bowling1999ijcai-bounding,
title = {{Bounding the Suboptimality of Reusing Subproblem}},
author = {Bowling, Michael H. and Veloso, Manuela M.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {1999},
pages = {1340-1347},
url = {https://mlanthology.org/ijcai/1999/bowling1999ijcai-bounding/}
}