Finite-Sample Analysis of Greedy-GQ with Linear Function Approximation Under Markovian Noise

Abstract

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

Cite

Text

Wang and Zou. "Finite-Sample Analysis of Greedy-GQ with Linear Function Approximation Under Markovian Noise." Uncertainty in Artificial Intelligence, 2020.

Markdown

[Wang and Zou. "Finite-Sample Analysis of Greedy-GQ with Linear Function Approximation Under Markovian Noise." Uncertainty in Artificial Intelligence, 2020.](https://mlanthology.org/uai/2020/wang2020uai-finitesample/)

BibTeX

@inproceedings{wang2020uai-finitesample,
  title     = {{Finite-Sample Analysis of Greedy-GQ with Linear Function Approximation Under Markovian Noise}},
  author    = {Wang, Yue and Zou, Shaofeng},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2020},
  pages     = {11-20},
  volume    = {124},
  url       = {https://mlanthology.org/uai/2020/wang2020uai-finitesample/}
}