Regularized Q-Learning Through Robust Averaging

Abstract

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins’ Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

Cite

Text

Schmitt-Förster and Sutter. "Regularized Q-Learning Through Robust Averaging." International Conference on Machine Learning, 2024.

Markdown

[Schmitt-Förster and Sutter. "Regularized Q-Learning Through Robust Averaging." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/schmittforster2024icml-regularized/)

BibTeX

@inproceedings{schmittforster2024icml-regularized,
  title     = {{Regularized Q-Learning Through Robust Averaging}},
  author    = {Schmitt-Förster, Peter and Sutter, Tobias},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {43742-43764},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/schmittforster2024icml-regularized/}
}