Asynchronous Stochastic Approximation and Q-Learning
Abstract
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
Cite
Text
Tsitsiklis. "Asynchronous Stochastic Approximation and Q-Learning." Machine Learning, 1994. doi:10.1007/BF00993306Markdown
[Tsitsiklis. "Asynchronous Stochastic Approximation and Q-Learning." Machine Learning, 1994.](https://mlanthology.org/mlj/1994/tsitsiklis1994mlj-asynchronous/) doi:10.1007/BF00993306BibTeX
@article{tsitsiklis1994mlj-asynchronous,
title = {{Asynchronous Stochastic Approximation and Q-Learning}},
author = {Tsitsiklis, John N.},
journal = {Machine Learning},
year = {1994},
pages = {185-202},
doi = {10.1007/BF00993306},
volume = {16},
url = {https://mlanthology.org/mlj/1994/tsitsiklis1994mlj-asynchronous/}
}