Model-Free Robust Average-Reward Reinforcement Learning

Abstract

Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence, and Wasserstein distance.

Cite

Text

Wang et al. "Model-Free Robust Average-Reward Reinforcement Learning." International Conference on Machine Learning, 2023.

Markdown

[Wang et al. "Model-Free Robust Average-Reward Reinforcement Learning." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/wang2023icml-modelfree/)

BibTeX

@inproceedings{wang2023icml-modelfree,
  title     = {{Model-Free Robust Average-Reward Reinforcement Learning}},
  author    = {Wang, Yue and Velasquez, Alvaro and Atia, George K. and Prater-Bennette, Ashley and Zou, Shaofeng},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {36431-36469},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/wang2023icml-modelfree/}
}