Temporal Coherence and Prediction Decay in TD Learning
Abstract
This paper describes improvements to the temporal difference TD(λ) learning method. The standard form of the TD(λ) method has the problem that two control parameters, learning rate and temporal discount, need to be chosen appropriately. These parameters can have a major effect on performance, particularly the learning rate parameter, which affects the stability of the process as well as the number of observations required. Our extension to the TD(λ) algorithm automatically sets and subsequently adjusts these parameters. The learning rate adjustment is based on a new concept we call temporal coherence (TC). The experiments reported here compare the extended TD(λ) algorithm performance with human-chosen parameters and with an earlier method for learning rate adjustment, in a complex game domain. The learning task was that of learning the relative values of pieces, without any initial domain-specific knowledge, and from self-play only. The results show that the improved method leads to better learning (i.e. faster and less subject to the effects of noise), than the selection of human-chosen values for the control parameters, and a comparison method.
Cite
Text
Beal and Smith. "Temporal Coherence and Prediction Decay in TD Learning." International Joint Conference on Artificial Intelligence, 1999.Markdown
[Beal and Smith. "Temporal Coherence and Prediction Decay in TD Learning." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/beal1999ijcai-temporal/)BibTeX
@inproceedings{beal1999ijcai-temporal,
title = {{Temporal Coherence and Prediction Decay in TD Learning}},
author = {Beal, Donald F. and Smith, Martin C.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {1999},
pages = {564-569},
url = {https://mlanthology.org/ijcai/1999/beal1999ijcai-temporal/}
}