Temporal-Difference Networks with History

Abstract

Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by historybased methods alone. In addition, we apply TD networks

Cite

Text

Tanner and Sutton. "Temporal-Difference Networks with History." International Joint Conference on Artificial Intelligence, 2005.

Markdown

[Tanner and Sutton. "Temporal-Difference Networks with History." International Joint Conference on Artificial Intelligence, 2005.](https://mlanthology.org/ijcai/2005/tanner2005ijcai-temporal/)

BibTeX

@inproceedings{tanner2005ijcai-temporal,
  title     = {{Temporal-Difference Networks with History}},
  author    = {Tanner, Brian and Sutton, Richard S.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {865-870},
  url       = {https://mlanthology.org/ijcai/2005/tanner2005ijcai-temporal/}
}