Improved Switching Among Temporally Abstract Actions

Abstract

In robotics and other control applications it is commonplace to have a pre(cid:173) existing set of controllers for solving subtasks, perhaps hand-crafted or previously learned or planned, and still face a difficult problem of how to choose and switch among the controllers to solve an overall task as well as possible. In this paper we present a framework based on Markov decision processes and semi-Markov decision processes for phrasing this problem, a basic theorem regarding the improvement in performance that can be ob(cid:173) tained by switching flexibly between given controllers, and example appli(cid:173) cations of the theorem. In particular, we show how an agent can plan with these high-level controllers and then use the results of such planning to find an even better plan, by modifying the existing controllers, with negligible additional cost and no re-planning. In one of our examples, the complexity of the problem is reduced from 24 billion state-action pairs to less than a million state-controller pairs.

Cite

Text

Sutton et al. "Improved Switching Among Temporally Abstract Actions." Neural Information Processing Systems, 1998.

Markdown

[Sutton et al. "Improved Switching Among Temporally Abstract Actions." Neural Information Processing Systems, 1998.](https://mlanthology.org/neurips/1998/sutton1998neurips-improved/)

BibTeX

@inproceedings{sutton1998neurips-improved,
  title     = {{Improved Switching Among Temporally Abstract Actions}},
  author    = {Sutton, Richard S. and Singh, Satinder P. and Precup, Doina and Ravindran, Balaraman},
  booktitle = {Neural Information Processing Systems},
  year      = {1998},
  pages     = {1066-1072},
  url       = {https://mlanthology.org/neurips/1998/sutton1998neurips-improved/}
}