Targeted Uncertainty Reduction in Robust MDPs

Abstract

Robust Markov decision processes (MDPs) provide a practical framework for generalizing trained agents to new environments. There, the objective is to maximize performance under the worst model of a given uncertainty set. By construction, this raises a performance-robustness dilemma: accounting for too large uncertainty yields guarantees against larger disturbances, whilst too small uncertainty may result in over-sensitivity to model misspecification. In this work, we introduce an online method that addresses the conservativeness of robust MDPs by strategically contracting the uncertainty set. First, we explicitly formulate the gradient of the robust return with respect to the uncertainty radius. This gradient derivation enables us to prioritize efforts in reducing uncertainty and leads us to interesting findings on the relation between the robust return and the uncertainty set. Second, we present a sampling-based algorithm aimed at enhancing our uncertainty estimation with respect to the robust return. Third, we illustrate the effectiveness of our algorithm within a tabular environment.

Cite

Text

Gadot et al. "Targeted Uncertainty Reduction in Robust MDPs." NeurIPS 2023 Workshops: GenPlan, 2023.

Markdown

[Gadot et al. "Targeted Uncertainty Reduction in Robust MDPs." NeurIPS 2023 Workshops: GenPlan, 2023.](https://mlanthology.org/neuripsw/2023/gadot2023neuripsw-targeted/)

BibTeX

@inproceedings{gadot2023neuripsw-targeted,
  title     = {{Targeted Uncertainty Reduction in Robust MDPs}},
  author    = {Gadot, Uri and Wang, Kaixin and Derman, Esther and Kumar, Navdeep and Levy, Kfir and Mannor, Shie},
  booktitle = {NeurIPS 2023 Workshops: GenPlan},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/gadot2023neuripsw-targeted/}
}