Fast Online Q(lambda)

Abstract

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

Cite

Text

Wiering and Schmidhuber. "Fast Online Q(lambda)." Machine Learning, 1998. doi:10.1023/A:1007562800292

Markdown

[Wiering and Schmidhuber. "Fast Online Q(lambda)." Machine Learning, 1998.](https://mlanthology.org/mlj/1998/wiering1998mlj-fast/) doi:10.1023/A:1007562800292

BibTeX

@article{wiering1998mlj-fast,
  title     = {{Fast Online Q(lambda)}},
  author    = {Wiering, Marco A. and Schmidhuber, Jürgen},
  journal   = {Machine Learning},
  year      = {1998},
  pages     = {105-115},
  doi       = {10.1023/A:1007562800292},
  volume    = {33},
  url       = {https://mlanthology.org/mlj/1998/wiering1998mlj-fast/}
}