A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

Abstract

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(?) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

Cite

Text

Amortila et al. "A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms." Artificial Intelligence and Statistics, 2020.

Markdown

[Amortila et al. "A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/amortila2020aistats-distributional/)

BibTeX

@inproceedings{amortila2020aistats-distributional,
  title     = {{A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms}},
  author    = {Amortila, Philip and Precup, Doina and Panangaden, Prakash and Bellemare, Marc G.},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {4357-4366},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/amortila2020aistats-distributional/}
}