Relaxed Transition Kernels Can Cure Underestimation in Adversarial Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) trains policies from pre-collected data without further environment interaction. However, discrepancies between the dataset and true environment—particularly in the state transition kernel—can degrade policy performance. To simulate environment shifts without being overly conservative, we introduce a relaxed state-adversarial method that perturbs the policy while applying a controlled relaxation mechanism. This method improves robustness by interpolating between nominal and adversarial dynamics. Theoretically, we provide a performance lower bound; empirically, we show improved results across challenging offline RL benchmarks. Our approach integrates easily with existing model-free algorithms and consistently outperforms baselines, especially in high-difficulty domains like Adroit and AntMaze.

Cite

Text

Wang et al. "Relaxed Transition Kernels Can Cure Underestimation in Adversarial Offline Reinforcement Learning." Proceedings of the 17th Asian Conference on Machine Learning, 2025.

Markdown

[Wang et al. "Relaxed Transition Kernels Can Cure Underestimation in Adversarial Offline Reinforcement Learning." Proceedings of the 17th Asian Conference on Machine Learning, 2025.](https://mlanthology.org/acml/2025/wang2025acml-relaxed/)

BibTeX

@inproceedings{wang2025acml-relaxed,
  title     = {{Relaxed Transition Kernels Can Cure Underestimation in Adversarial Offline Reinforcement Learning}},
  author    = {Wang, Ziyu and Hsieh, Ping-Chun and Wang, Yu-Shuen and Lien, Yun-Hsuan},
  booktitle = {Proceedings of the 17th Asian Conference on Machine Learning},
  year      = {2025},
  pages     = {145-160},
  volume    = {304},
  url       = {https://mlanthology.org/acml/2025/wang2025acml-relaxed/}
}