Temporal-Difference Networks

Abstract

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single pre- diction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predic- tions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possi- ble with conventional TD methods. Secondly, we show that if the inter- predictive relationships are made conditional on action, then the usual learning-efficiency advantage of TD methods over Monte Carlo (super- vised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state representations that enable exact solution of a non-Markov problem. A very broad range of inter-predictive temporal relationships can be expressed in these net- works. Overall we argue that TD networks represent a substantial ex- tension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.

Cite

Text

Sutton and Tanner. "Temporal-Difference Networks." Neural Information Processing Systems, 2004.

Markdown

[Sutton and Tanner. "Temporal-Difference Networks." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/sutton2004neurips-temporaldifference/)

BibTeX

@inproceedings{sutton2004neurips-temporaldifference,
  title     = {{Temporal-Difference Networks}},
  author    = {Sutton, Richard S. and Tanner, Brian},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {1377-1384},
  url       = {https://mlanthology.org/neurips/2004/sutton2004neurips-temporaldifference/}
}