Temporal Difference Learning in Continuous Time and Space
Abstract
A continuous-time, continuous-state version of the temporal differ(cid:173) ence (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobi(cid:173) ological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The per(cid:173) formance of the algorithms was tested in a task of swinging up a pendulum with limited torque. Both the "critic" that specifies the paths to the upright position and the "actor" that works as a non(cid:173) linear feedback controller were successfully implemented by radial basis function (RBF) networks.
Cite
Text
Doya. "Temporal Difference Learning in Continuous Time and Space." Neural Information Processing Systems, 1995.Markdown
[Doya. "Temporal Difference Learning in Continuous Time and Space." Neural Information Processing Systems, 1995.](https://mlanthology.org/neurips/1995/doya1995neurips-temporal/)BibTeX
@inproceedings{doya1995neurips-temporal,
title = {{Temporal Difference Learning in Continuous Time and Space}},
author = {Doya, Kenji},
booktitle = {Neural Information Processing Systems},
year = {1995},
pages = {1073-1079},
url = {https://mlanthology.org/neurips/1995/doya1995neurips-temporal/}
}