An Analysis of Reinforcement Learning with Function Approximation
Abstract
We address the problem of computing the optimal Q-function in Markov decision problems with infinite state-space. We analyze the convergence properties of several variations of Q-learning when combined with function approximation, extending the analysis of TD-learning in (Tsitsilis and Van Roy, 1996) to stochastic control settings. We identify conditions under which such approximate methods converge with probability 1. We conclude with a brief discussion on the general applicability of our results and compare them with several related works.
Cite
Text
Melo et al. "An Analysis of Reinforcement Learning with Function Approximation." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390240Markdown
[Melo et al. "An Analysis of Reinforcement Learning with Function Approximation." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/melo2008icml-analysis/) doi:10.1145/1390156.1390240BibTeX
@inproceedings{melo2008icml-analysis,
title = {{An Analysis of Reinforcement Learning with Function Approximation}},
author = {Melo, Francisco S. and Meyn, Sean P. and Ribeiro, M. Isabel},
booktitle = {International Conference on Machine Learning},
year = {2008},
pages = {664-671},
doi = {10.1145/1390156.1390240},
url = {https://mlanthology.org/icml/2008/melo2008icml-analysis/}
}