Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning
Abstract
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness–or the lack thereof–remains an important issue that remains inadequately addressed. In this paper, we provide a distributionally robust formulation of offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment arising as a perturbation of the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves $O_P\left(1/\sqrt{n}\right)$ regret, meaning that with high probability, the policy learned from using $n$ training data points will be $O\left(1/\sqrt{n}\right)$ close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
Cite
Text
Zhou et al. "Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning." Artificial Intelligence and Statistics, 2021.Markdown
[Zhou et al. "Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/zhou2021aistats-finitesample/)BibTeX
@inproceedings{zhou2021aistats-finitesample,
title = {{Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning}},
author = {Zhou, Zhengqing and Zhou, Zhengyuan and Bai, Qinxun and Qiu, Linhai and Blanchet, Jose and Glynn, Peter},
booktitle = {Artificial Intelligence and Statistics},
year = {2021},
pages = {3331-3339},
volume = {130},
url = {https://mlanthology.org/aistats/2021/zhou2021aistats-finitesample/}
}