Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms

Singh, Satinder; Jaakkola, Tommi S.; Littman, Michael L.; Szepesvári, Csaba

doi:10.1023/A:1007678930559

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms

Satinder Singh, Tommi S. Jaakkola, Michael L. Littman, Csaba Szepesvári

MLJ 2000 pp. 287-308

doi:10.1023/A:1007678930559 /mlj/2000/singh2000mlj-convergence/

Abstract

An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for control is balancing the exploration/exploitation tradeoff. Existing theoretical results for RL give very little guidance on reasonable ways to perform exploration. In this paper, we examine the convergence of single-step on-policy RL algorithms for control. On-policy algorithms cannot separate exploration from learning and therefore must confront the exploration problem directly. We prove convergence results for several related on-policy algorithms with both decaying exploration and persistent exploration. We also provide examples of exploration strategies that can be followed during learning that result in convergence to both optimal values and optimal policies.

PDF MLJ Semantic Scholar

Cite

Text

Singh et al. "Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms." Machine Learning, 2000. doi:10.1023/A:1007678930559

Markdown

[Singh et al. "Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms." Machine Learning, 2000.](https://mlanthology.org/mlj/2000/singh2000mlj-convergence/) doi:10.1023/A:1007678930559

BibTeX

@article{singh2000mlj-convergence,
  title     = {{Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms}},
  author    = {Singh, Satinder and Jaakkola, Tommi S. and Littman, Michael L. and Szepesvári, Csaba},
  journal   = {Machine Learning},
  year      = {2000},
  pages     = {287-308},
  doi       = {10.1023/A:1007678930559},
  volume    = {38},
  url       = {https://mlanthology.org/mlj/2000/singh2000mlj-convergence/}
}