Distributionally Robust Reinforcement Learning

Abstract

Real-world applications require RL algorithms to act safely. During learning process, it is likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the system. Exploration is particularly brittle in high-dimensional state/action space due to increased number of low-performing actions. In this work, we consider risk-averse exploration in approximate RL setting. To ensure safety during learning, we propose the distributionally robust policy iteration scheme that provides tight lower bound guarantee on state-values. Our approach induces a dynamic level of risk to prevent poor decisions and yet preserves the convergence to the optimal policy. Our formulation results in a tractable algorithm that accounts for a simple re-weighting of policy actions in the standard policy iteration scheme. We extend our approach to continuous state/action space and present a practical algorithm, distributionally robust soft actor-critic, that implements a different exploration strategy: it acts conservatively at short-term and it explores optimistically in a long-run. We provide promising experimental results on continuous control tasks.

Cite

Text

Smirnova et al. "Distributionally Robust Reinforcement Learning." ICML 2019 Workshops: RL4RealLife, 2019.

Markdown

[Smirnova et al. "Distributionally Robust Reinforcement Learning." ICML 2019 Workshops: RL4RealLife, 2019.](https://mlanthology.org/icmlw/2019/smirnova2019icmlw-distributionally/)

BibTeX

@inproceedings{smirnova2019icmlw-distributionally,
  title     = {{Distributionally Robust Reinforcement Learning}},
  author    = {Smirnova, Elena and Dohmatob, Elvis and Mary, Jérémie},
  booktitle = {ICML 2019 Workshops: RL4RealLife},
  year      = {2019},
  url       = {https://mlanthology.org/icmlw/2019/smirnova2019icmlw-distributionally/}
}