Temporal Difference Learning in Continuous Time and Space

Abstract

A continuous-time, continuous-state version of the temporal differ(cid:173) ence (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobi(cid:173) ological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The per(cid:173) formance of the algorithms was tested in a task of swinging up a pendulum with limited torque. Both the "critic" that specifies the paths to the upright position and the "actor" that works as a non(cid:173) linear feedback controller were successfully implemented by radial basis function (RBF) networks.

Cite

Text

Doya. "Temporal Difference Learning in Continuous Time and Space." Neural Information Processing Systems, 1995.

Markdown

[Doya. "Temporal Difference Learning in Continuous Time and Space." Neural Information Processing Systems, 1995.](https://mlanthology.org/neurips/1995/doya1995neurips-temporal/)

BibTeX

@inproceedings{doya1995neurips-temporal,
  title     = {{Temporal Difference Learning in Continuous Time and Space}},
  author    = {Doya, Kenji},
  booktitle = {Neural Information Processing Systems},
  year      = {1995},
  pages     = {1073-1079},
  url       = {https://mlanthology.org/neurips/1995/doya1995neurips-temporal/}
}