A Finite Sample Complexity Bound for Distributionally Robust Q-Learning

Abstract

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust Q-learning framework studied in [Liu et. al. 2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust Q-function within an $\epsilon$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where $\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

Cite

Text

Wang et al. "A Finite Sample Complexity Bound for Distributionally Robust Q-Learning." Artificial Intelligence and Statistics, 2023.

Markdown

[Wang et al. "A Finite Sample Complexity Bound for Distributionally Robust Q-Learning." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/wang2023aistats-finite/)

BibTeX

@inproceedings{wang2023aistats-finite,
  title     = {{A Finite Sample Complexity Bound for Distributionally Robust Q-Learning}},
  author    = {Wang, Shengbo and Si, Nian and Blanchet, Jose and Zhou, Zhengyuan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {3370-3398},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/wang2023aistats-finite/}
}