Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR
Abstract
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $\Omega(\sqrt{\tau^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a novel Bernstein bonus. For online RL in tabular Markov Decision Processes (MDPs), we show a minimax regret lower bound of $\Omega(\sqrt{\tau^{-1}SAK})$ (with normalized cumulative rewards), where $S$ is the number of states, and we propose a novel bonus-driven Value Iteration procedure. We show that our algorithm achieves the optimal regret of $\widetilde O(\sqrt{\tau^{-1}SAK})$ under a continuity assumption and in general attains a near-optimal regret of $\widetilde O(\tau^{-1}\sqrt{SAK})$, which is minimax-optimal for constant $\tau$. This improves on the best available bounds. By discretizing rewards appropriately, our algorithms are computationally efficient.
Cite
Text
Wang et al. "Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR." International Conference on Machine Learning, 2023.Markdown
[Wang et al. "Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/wang2023icml-nearminimaxoptimal/)BibTeX
@inproceedings{wang2023icml-nearminimaxoptimal,
title = {{Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR}},
author = {Wang, Kaiwen and Kallus, Nathan and Sun, Wen},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {35864-35907},
volume = {202},
url = {https://mlanthology.org/icml/2023/wang2023icml-nearminimaxoptimal/}
}