Temporal-Difference Networks

NeurIPS 2004 pp. 1377-1384

/neurips/2004/sutton2004neurips-temporaldifference/

Abstract

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single pre- diction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predic- tions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possi- ble with conventional TD methods. Secondly, we show that if the inter- predictive relationships are made conditional on action, then the usual learning-efficiency advantage of TD methods over Monte Carlo (super- vised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state representations that enable exact solution of a non-Markov problem. A very broad range of inter-predictive temporal relationships can be expressed in these net- works. Overall we argue that TD networks represent a substantial ex- tension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.

PDF NeurIPS Semantic Scholar

Cite

Text

Sutton and Tanner. "Temporal-Difference Networks." Neural Information Processing Systems, 2004.

Markdown

[Sutton and Tanner. "Temporal-Difference Networks." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/sutton2004neurips-temporaldifference/)

BibTeX

@inproceedings{sutton2004neurips-temporaldifference,
  title     = {{Temporal-Difference Networks}},
  author    = {Sutton, Richard S. and Tanner, Brian},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {1377-1384},
  url       = {https://mlanthology.org/neurips/2004/sutton2004neurips-temporaldifference/}
}