Convergence of Stochastic Iterative Dynamic Programming Algorithms
Abstract
Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learn(cid:173) ing problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the pow(cid:173) erful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD(") and Q-Iearning belong.
Cite
Text
Jaakkola et al. "Convergence of Stochastic Iterative Dynamic Programming Algorithms." Neural Information Processing Systems, 1993.Markdown
[Jaakkola et al. "Convergence of Stochastic Iterative Dynamic Programming Algorithms." Neural Information Processing Systems, 1993.](https://mlanthology.org/neurips/1993/jaakkola1993neurips-convergence/)BibTeX
@inproceedings{jaakkola1993neurips-convergence,
title = {{Convergence of Stochastic Iterative Dynamic Programming Algorithms}},
author = {Jaakkola, Tommi and Jordan, Michael I. and Singh, Satinder P.},
booktitle = {Neural Information Processing Systems},
year = {1993},
pages = {703-710},
url = {https://mlanthology.org/neurips/1993/jaakkola1993neurips-convergence/}
}