Temporal Abstraction in Temporal-Difference Networks

Abstract

We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network. Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment. These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction. The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm's effectiveness and of the greater representational expressiveness of temporallyabstract TD networks. The primary distinguishing feature of temporal-difference (TD) networks (Sutton & Tanner, 2005) is that they permit a general compositional specification of the goals of learning. The goals of learning are thought of as predictive questions being asked by the agent in the learning problem, such as "What will I see if I step forward and look right?" or "If I open the fridge, will I see a bottle of beer?" Seeing a bottle of beer is of course a complicated perceptual act. It might be thought of as obtaining a set of predictions about what would happen if certain reaching and grasping actions were taken, about what would happen if the bottle were opened and turned upside down, and of what the bottle would look like if viewed from various angles. To predict seeing a bottle of beer is thus to make a prediction about a set of other predictions. The target for the overall prediction is a composition in the mathematical sense of the first prediction with each of the other predictions. TD networks are the first framework for representing the goals of predictive learning in a compositional, machine-accessible form. Each node of a TD network represents an individual question--something to be predicted--and has associated with it a value representing an answer to the question--a prediction of that something. The questions are represented by a set of directed links between nodes. If node 1 is linked to node 2, then node 1 rep-

Cite

Text

Rafols et al. "Temporal Abstraction in Temporal-Difference Networks." Neural Information Processing Systems, 2005.

Markdown

[Rafols et al. "Temporal Abstraction in Temporal-Difference Networks." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/rafols2005neurips-temporal/)

BibTeX

@inproceedings{rafols2005neurips-temporal,
  title     = {{Temporal Abstraction in Temporal-Difference Networks}},
  author    = {Rafols, Eddie and Koop, Anna and Sutton, Richard S.},
  booktitle = {Neural Information Processing Systems},
  year      = {2005},
  pages     = {1313-1320},
  url       = {https://mlanthology.org/neurips/2005/rafols2005neurips-temporal/}
}