Speeding up Q(lambda)-Learning
Abstract
Q(λ)-learning uses TD(λ)-methods to accelerate Q-Learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.
Cite
Text
Wiering and Schmidhuber. "Speeding up Q(lambda)-Learning." European Conference on Machine Learning, 1998. doi:10.1007/BFB0026706Markdown
[Wiering and Schmidhuber. "Speeding up Q(lambda)-Learning." European Conference on Machine Learning, 1998.](https://mlanthology.org/ecmlpkdd/1998/wiering1998ecml-speeding/) doi:10.1007/BFB0026706BibTeX
@inproceedings{wiering1998ecml-speeding,
title = {{Speeding up Q(lambda)-Learning}},
author = {Wiering, Marco A. and Schmidhuber, Jürgen},
booktitle = {European Conference on Machine Learning},
year = {1998},
pages = {352-363},
doi = {10.1007/BFB0026706},
url = {https://mlanthology.org/ecmlpkdd/1998/wiering1998ecml-speeding/}
}