Minimax Optimal Adversarial Reinforcement Learning

Abstract

Consider episodic Markov decision processes (MDPs) with adversarially chosen transition kernels, where the transition kernel is adversarially chosen at each episode. Prior works have established regret upper bounds of $\widetilde{\mathcal{O}}(\sqrt{T} + C^P)$, where $T$ is the number of episodes and $C^P$ quantifies the degree of adversarial change in the transition dynamics. This regret bound may scale as large as $\mathcal{O}(T)$, leading to a linear regret. This raises a fundamental question: *Can sublinear regret be achieved under fully adversarial transition kernels?* We answer this question affirmatively. First, we show that the optimal policy for MDPs with adversarial transition kernels must be history-dependent. We then design an algorithm of Adversarial Dynamics Follow-the-Regularized-Leader (AD-FTRL), and prove that it achieves a sublinear regret of $\mathcal{O}(\sqrt{(|\mathcal{S}||\mathcal{A}|)^K T})$, where $K$ is the horizon length, $|\mathcal{S}|$ is the number of states, and $|\mathcal{A}|$ is the number of actions. Such a regret cannot be achieved by simply solving this problem as a contextual bandit. We further construct a hard MDP instance and prove a matching lower bound on the regret, which thereby demonstrates the **minimax optimality** of our algorithm.

Cite

Text

Wang et al. "Minimax Optimal Adversarial Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Minimax Optimal Adversarial Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-minimax/)

BibTeX

@inproceedings{wang2026iclr-minimax,
  title     = {{Minimax Optimal Adversarial Reinforcement Learning}},
  author    = {Wang, Yudan and Ji, Kaiyi and Shi, Ming and Zou, Shaofeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-minimax/}
}