Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Abstract

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as “lock-in probability”. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection. As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.

Cite

Text

Dalal et al. "Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning." Annual Conference on Computational Learning Theory, 2018.

Markdown

[Dalal et al. "Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning." Annual Conference on Computational Learning Theory, 2018.](https://mlanthology.org/colt/2018/dalal2018colt-finite/)

BibTeX

@inproceedings{dalal2018colt-finite,
  title     = {{Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning}},
  author    = {Dalal, Gal and Thoppe, Gugan and Szörényi, Balázs and Mannor, Shie},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2018},
  pages     = {1199-1233},
  url       = {https://mlanthology.org/colt/2018/dalal2018colt-finite/}
}