Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates

Abstract

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.

Cite

Text

Mei et al. "Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates." Neural Information Processing Systems, 2024. doi:10.52202/079017-2371

Markdown

[Mei et al. "Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/mei2024neurips-small/) doi:10.52202/079017-2371

BibTeX

@inproceedings{mei2024neurips-small,
  title     = {{Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates}},
  author    = {Mei, Jincheng and Dai, Bo and Agarwal, Alekh and Vaswani, Sharan and Raj, Anant and Szepesvári, Csaba and Schuurmans, Dale},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2371},
  url       = {https://mlanthology.org/neurips/2024/mei2024neurips-small/}
}