Theoretical Results on Reinforcement Learning with Temporally Abstract Options

Abstract

We present new theoretical results on planning within the framework of temporally abstract reinforcement learning (Precup & Sutton, 1997; Sutton, 1995). Temporal abstraction is a key step in any decision making system that involves planning and prediction. In temporally abstract reinforcement learning, the agent is allowed to choose among “options”, whole courses of action that may be temporally extended, stochastic, and contingent on previous events. Examples of options include closed-loop policies such as picking up an object, as well as primitive actions such as joint torques. Knowledge about the consequences of options is represented by special structures called multi-time models. In this paper we focus on the theory of planning with multi-time models. We define new Bellman equations that are satisfied for sets of multi-time models. As a consequence, multi-time models can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration.

Cite

Text

Precup et al. "Theoretical Results on Reinforcement Learning with Temporally Abstract Options." European Conference on Machine Learning, 1998. doi:10.1007/BFB0026709

Markdown

[Precup et al. "Theoretical Results on Reinforcement Learning with Temporally Abstract Options." European Conference on Machine Learning, 1998.](https://mlanthology.org/ecmlpkdd/1998/precup1998ecml-theoretical/) doi:10.1007/BFB0026709

BibTeX

@inproceedings{precup1998ecml-theoretical,
  title     = {{Theoretical Results on Reinforcement Learning with Temporally Abstract Options}},
  author    = {Precup, Doina and Sutton, Richard S. and Singh, Satinder},
  booktitle = {European Conference on Machine Learning},
  year      = {1998},
  pages     = {382-393},
  doi       = {10.1007/BFB0026709},
  url       = {https://mlanthology.org/ecmlpkdd/1998/precup1998ecml-theoretical/}
}