Speeding up Q(lambda)-Learning

Wiering, Marco A.; Schmidhuber, Jürgen

doi:10.1007/BFB0026706

Speeding up Q(lambda)-Learning

Marco A. Wiering, Jürgen Schmidhuber

ECML-PKDD 1998 pp. 352-363

doi:10.1007/BFB0026706 /ecmlpkdd/1998/wiering1998ecml-speeding/

Abstract

Q(λ)-learning uses TD(λ)-methods to accelerate Q-Learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Wiering and Schmidhuber. "Speeding up Q(lambda)-Learning." European Conference on Machine Learning, 1998. doi:10.1007/BFB0026706

Markdown

[Wiering and Schmidhuber. "Speeding up Q(lambda)-Learning." European Conference on Machine Learning, 1998.](https://mlanthology.org/ecmlpkdd/1998/wiering1998ecml-speeding/) doi:10.1007/BFB0026706

BibTeX

@inproceedings{wiering1998ecml-speeding,
  title     = {{Speeding up Q(lambda)-Learning}},
  author    = {Wiering, Marco A. and Schmidhuber, Jürgen},
  booktitle = {European Conference on Machine Learning},
  year      = {1998},
  pages     = {352-363},
  doi       = {10.1007/BFB0026706},
  url       = {https://mlanthology.org/ecmlpkdd/1998/wiering1998ecml-speeding/}
}