Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

Abstract

This work presents the first finite-time analysis of average-reward $Q$-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes that act as local clocks for each state-action pair. We show that the mean-square error of this $Q$-learning algorithm, measured in the span seminorm, converges at a rate of $\smash{\tilde{\mathcal{O}}(1/k)}$. To establish this result, we demonstrate that adaptive stepsizes are necessary: without them, the algorithm fails to converge to the correct target. Moreover, adaptive stepsizes can be viewed as a form of implicit importance sampling that counteracts the effect of asynchronous updates. Technically, the use of adaptive stepsizes causes each $Q$-learning update to depend on the full sample history, introducing strong correlations and making the algorithm a non-Markovian stochastic approximation (SA) scheme. Our approach to overcoming this challenge involves (1) a time-inhomogeneous Markovian reformulation of non-Markovian SA, and (2) a combination of almost-sure time-varying bounds, conditioning arguments, and Markov chain concentration inequalities to break the strong correlations between the adaptive stepsizes and the iterates.

Cite

Text

Chen. "Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes." Advances in Neural Information Processing Systems, 2025.

Markdown

[Chen. "Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chen2025neurips-nonasymptotic/)

BibTeX

@inproceedings{chen2025neurips-nonasymptotic,
  title     = {{Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes}},
  author    = {Chen, Zaiwei},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/chen2025neurips-nonasymptotic/}
}