Solving Robust MDPs Through No-Regret Dynamics

Abstract

Reinforcement learning is a powerful framework for training agents to navigate different situations, but it is susceptible to changes in environmental dynamics. Generating an algorithm that can find environmentally robust policies efficiently and handle different model parameterizations without imposing stringent assumptions on the uncertainty set of transitions is difficult due to the intricate interactions between policy and environment. In this paper, we address both of these issues with a No-Regret Dynamics framework that utilizes policy gradient methods and iteratively approximates the worst case environment during training, avoiding assumptions on the uncertainty set. Alongside a toolbox of nonconvex online learning algorithms, we demonstrate that our framework can achieve fast convergence rates for many different problem settings and relax assumptions on the uncertainty set of transitions.

Cite

Text

Guha. "Solving Robust MDPs Through No-Regret Dynamics." Transactions on Machine Learning Research, 2024.

Markdown

[Guha. "Solving Robust MDPs Through No-Regret Dynamics." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/guha2024tmlr-solving/)

BibTeX

@article{guha2024tmlr-solving,
  title     = {{Solving Robust MDPs Through No-Regret Dynamics}},
  author    = {Guha, Etash Kumar},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/guha2024tmlr-solving/}
}