Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions

Abstract

In this work, we study the logarithmic regret for reinforcement learning (RL) with linear function approximation and adversarial corruptions, in the formulation of linear Markov decision processes (MDPs). Specifically, we consider the case where there exist adversarial corruptions over the reward functions, and the total amount of the corruptions of each step h across all episodes K is bounded by a corruption level C ≥ 0. We propose an algorithm, double-weighted least-squares value iteration with UCB (DW-LSVI-UCB), which leverages weighted linear regressions to learn the (corrupted) unknown reward parameters and unknown transition parameters simultaneously. We prove that DW-LSVI-UCB attains an O( d2H4 log2(1+K/δ) gapmin + CdH2) regret (omitting the dependence on lower order terms), where d is the ambient dimension of the feature mapping, H is the horizon length, gapmin is the minimal sub-optimality gap, and K is the number of episodes. Additionally, when there are no adversarial corruptions over reward functions, the regret of our algorithm improves the previous best result by an O(dH/ log K) factor.

Cite

Text

Zhao et al. "Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34436

Markdown

[Zhao et al. "Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhao2025aaai-logarithmic/) doi:10.1609/AAAI.V39I21.34436

BibTeX

@inproceedings{zhao2025aaai-logarithmic,
  title     = {{Logarithmic Regret for Linear Markov Decision Processes with Adversarial Corruptions}},
  author    = {Zhao, Canzhe and Zhang, Xiangcheng and Wang, Baoxiang and Li, Shuai},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {22759-22767},
  doi       = {10.1609/AAAI.V39I21.34436},
  url       = {https://mlanthology.org/aaai/2025/zhao2025aaai-logarithmic/}
}