A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

Abstract

Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in infinite-horizon when no model is available. In this article we consider the particular framework of nonstationary finite-horizon Markov Decision Processes. After establishing a relationship between the finite-horizon total reward criterion and the average-reward criterion in finite-horizon, we define QH -Learning and RH -Learning for finite-horizon MDPs. Then we introduce the Ordinary Differential Equation (ODE) method to conduct a learning rate analysis of QH -Learning and RH - Learning. RH -Learning appears to be a version of QH -Learning with matrix-valued stepsizes, the corresponding gain matrix being very close to the optimal matrix which results from the ODE analysis. Experimental results confirm that performance hierarchy. 1 Introduction The search for optimal policies in Markov Decision Processes has been deeply studied according t...

Cite

Text

Garçia and Ndiaye. "A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon." International Conference on Machine Learning, 1998.

Markdown

[Garçia and Ndiaye. "A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon." International Conference on Machine Learning, 1998.](https://mlanthology.org/icml/1998/garcia1998icml-learning/)

BibTeX

@inproceedings{garcia1998icml-learning,
  title     = {{A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon}},
  author    = {Garçia, Frédérick and Ndiaye, Seydina M.},
  booktitle = {International Conference on Machine Learning},
  year      = {1998},
  pages     = {215-223},
  url       = {https://mlanthology.org/icml/1998/garcia1998icml-learning/}
}