Optimizing Parameter Learning Using Temporal Differences

Abstract

Temporal difference algorithms are useful when attempting to predict outcome based on some pattern, such as a vector of evaluation parameters applied to the leaf nodes of a state space search. As time progresses, the vector begins to converge towards an optimal state, in which program performance peaks. Temporal difference algorithms continually modify the weights of a differentiable, continuous evaluation function. As pointed out by De Jong and Schultz, expert systems that rely on experience-based learning mechanisms are more useful in the field than systems that rely on growing knowledge bases (De Jong and Schultz 1988). This research focuses on the application of the TDLeaf algorithm to the domain of computer chess. In this poster I present empirical data showing the evolution of a vector of evaluation weights and the associated performance ratings under a variety of conditions.

Cite

Text

Ii. "Optimizing Parameter Learning Using Temporal Differences." AAAI Conference on Artificial Intelligence, 2002. doi:10.5555/777092.777245

Markdown

[Ii. "Optimizing Parameter Learning Using Temporal Differences." AAAI Conference on Artificial Intelligence, 2002.](https://mlanthology.org/aaai/2002/ii2002aaai-optimizing/) doi:10.5555/777092.777245

BibTeX

@inproceedings{ii2002aaai-optimizing,
  title     = {{Optimizing Parameter Learning Using Temporal Differences}},
  author    = {Ii, James F. Swafford},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2002},
  pages     = {965-966},
  doi       = {10.5555/777092.777245},
  url       = {https://mlanthology.org/aaai/2002/ii2002aaai-optimizing/}
}