Minimax Optimal Adversarial Reinforcement Learning

Wang, Yudan; Ji, Kaiyi; Shi, Ming; Zou, Shaofeng

Minimax Optimal Adversarial Reinforcement Learning

Yudan Wang, Kaiyi Ji, Ming Shi, Shaofeng Zou

ICLR 2026

/iclr/2026/wang2026iclr-minimax/

Abstract

Consider episodic Markov decision processes (MDPs) with adversarially chosen transition kernels, where the transition kernel is adversarially chosen at each episode. Prior works have established regret upper bounds of $\widetilde{\mathcal{O}}(\sqrt{T} + C^P)$, where $T$ is the number of episodes and $C^P$ quantifies the degree of adversarial change in the transition dynamics. This regret bound may scale as large as $\mathcal{O}(T)$, leading to a linear regret. This raises a fundamental question: *Can sublinear regret be achieved under fully adversarial transition kernels?* We answer this question affirmatively. First, we show that the optimal policy for MDPs with adversarial transition kernels must be history-dependent. We then design an algorithm of Adversarial Dynamics Follow-the-Regularized-Leader (AD-FTRL), and prove that it achieves a sublinear regret of $\mathcal{O}(\sqrt{(|\mathcal{S}||\mathcal{A}|)^K T})$, where $K$ is the horizon length, $|\mathcal{S}|$ is the number of states, and $|\mathcal{A}|$ is the number of actions. Such a regret cannot be achieved by simply solving this problem as a contextual bandit. We further construct a hard MDP instance and prove a matching lower bound on the regret, which thereby demonstrates the **minimax optimality** of our algorithm.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "Minimax Optimal Adversarial Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Minimax Optimal Adversarial Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-minimax/)

BibTeX

@inproceedings{wang2026iclr-minimax,
  title     = {{Minimax Optimal Adversarial Reinforcement Learning}},
  author    = {Wang, Yudan and Ji, Kaiyi and Shi, Ming and Zou, Shaofeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-minimax/}
}