Temporal Difference Learning in Continuous Time and Space

NeurIPS 1995 pp. 1073-1079

/neurips/1995/doya1995neurips-temporal/

Abstract

A continuous-time, continuous-state version of the temporal differ(cid:173) ence (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobi(cid:173) ological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The per(cid:173) formance of the algorithms was tested in a task of swinging up a pendulum with limited torque. Both the "critic" that specifies the paths to the upright position and the "actor" that works as a non(cid:173) linear feedback controller were successfully implemented by radial basis function (RBF) networks.

PDF NeurIPS Semantic Scholar

Cite

Text

Doya. "Temporal Difference Learning in Continuous Time and Space." Neural Information Processing Systems, 1995.

Markdown

[Doya. "Temporal Difference Learning in Continuous Time and Space." Neural Information Processing Systems, 1995.](https://mlanthology.org/neurips/1995/doya1995neurips-temporal/)

BibTeX

@inproceedings{doya1995neurips-temporal,
  title     = {{Temporal Difference Learning in Continuous Time and Space}},
  author    = {Doya, Kenji},
  booktitle = {Neural Information Processing Systems},
  year      = {1995},
  pages     = {1073-1079},
  url       = {https://mlanthology.org/neurips/1995/doya1995neurips-temporal/}
}