The Mean-Squared Error of Double Q-Learning

Abstract

In this paper, we establish a theoretical comparison between the asymptotic mean square errors of double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting or with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-square error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and the output of Double Q-learning is the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.

Cite

Text

Weng et al. "The Mean-Squared Error of Double Q-Learning." Neural Information Processing Systems, 2020.

Markdown

[Weng et al. "The Mean-Squared Error of Double Q-Learning." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/weng2020neurips-meansquared/)

BibTeX

@inproceedings{weng2020neurips-meansquared,
  title     = {{The Mean-Squared Error of Double Q-Learning}},
  author    = {Weng, Wentao and Gupta, Harsh and He, Niao and Ying, Lei and Srikant, R.},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/weng2020neurips-meansquared/}
}