Finite-Sample Analysis for SARSA with Linear Function Approximation

NeurIPS 2019 pp. 8668-8678

/neurips/2019/zou2019neurips-finitesample/

Abstract

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ setting, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples, and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we further provide the finite-sample analysis on the mean square error of the SARSA algorithm. In the end, we present a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant as special cases. This fitted SARSA algorithm provides a framework for \textit{iterative} on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also present its finite-sample analysis.

PDF NeurIPS Semantic Scholar

Cite

Text

Zou et al. "Finite-Sample Analysis for SARSA with Linear Function Approximation." Neural Information Processing Systems, 2019.

Markdown

[Zou et al. "Finite-Sample Analysis for SARSA with Linear Function Approximation." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/zou2019neurips-finitesample/)

BibTeX

@inproceedings{zou2019neurips-finitesample,
  title     = {{Finite-Sample Analysis for SARSA with Linear Function Approximation}},
  author    = {Zou, Shaofeng and Xu, Tengyu and Liang, Yingbin},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {8668-8678},
  url       = {https://mlanthology.org/neurips/2019/zou2019neurips-finitesample/}
}