Convergence of Stochastic Iterative Dynamic Programming Algorithms

Abstract

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learn(cid:173) ing problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DP-based learning algorithms to the pow(cid:173) erful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD(") and Q-Iearning belong.

Cite

Text

Jaakkola et al. "Convergence of Stochastic Iterative Dynamic Programming Algorithms." Neural Information Processing Systems, 1993.

Markdown

[Jaakkola et al. "Convergence of Stochastic Iterative Dynamic Programming Algorithms." Neural Information Processing Systems, 1993.](https://mlanthology.org/neurips/1993/jaakkola1993neurips-convergence/)

BibTeX

@inproceedings{jaakkola1993neurips-convergence,
  title     = {{Convergence of Stochastic Iterative Dynamic Programming Algorithms}},
  author    = {Jaakkola, Tommi and Jordan, Michael I. and Singh, Satinder P.},
  booktitle = {Neural Information Processing Systems},
  year      = {1993},
  pages     = {703-710},
  url       = {https://mlanthology.org/neurips/1993/jaakkola1993neurips-convergence/}
}