Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning

Abstract

In this paper, we obtain the Berry–Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.

Cite

Text

Samsonov et al. "Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-0396

Markdown

[Samsonov et al. "Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/samsonov2024neurips-gaussian/) doi:10.52202/079017-0396

BibTeX

@inproceedings{samsonov2024neurips-gaussian,
  title     = {{Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning}},
  author    = {Samsonov, Sergey and Moulines, Eric and Shao, Qi-Man and Zhang, Zhuo-Song and Naumov, Alexey},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0396},
  url       = {https://mlanthology.org/neurips/2024/samsonov2024neurips-gaussian/}
}