Analytical Mean Squared Error Curves in Temporal Difference Learning
Abstract
We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its step(cid:173) size and eligibility trace parameters.
Cite
Text
Singh and Dayan. "Analytical Mean Squared Error Curves in Temporal Difference Learning." Neural Information Processing Systems, 1996.Markdown
[Singh and Dayan. "Analytical Mean Squared Error Curves in Temporal Difference Learning." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/singh1996neurips-analytical/)BibTeX
@inproceedings{singh1996neurips-analytical,
title = {{Analytical Mean Squared Error Curves in Temporal Difference Learning}},
author = {Singh, Satinder P. and Dayan, Peter},
booktitle = {Neural Information Processing Systems},
year = {1996},
pages = {1054-1060},
url = {https://mlanthology.org/neurips/1996/singh1996neurips-analytical/}
}