Non-Stationary Markov Decision Processes, a Worst-Case Approach Using Model-Based Reinforcement Learning

NeurIPS 2019 pp. 7216-7225

/neurips/2019/lecarpentier2019neurips-nonstationary/

Abstract

This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously with a bounded evolution rate; 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. 1) we define a specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time; 2) we consider a planning agent using the current model of the environment but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent; 3) following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm, a zero-shot Model-Based method similar to Minimax search; 4) we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms.

PDF NeurIPS Semantic Scholar

Cite

Text

Lecarpentier and Rachelson. "Non-Stationary Markov Decision Processes, a Worst-Case Approach Using Model-Based Reinforcement Learning." Neural Information Processing Systems, 2019.

Markdown

[Lecarpentier and Rachelson. "Non-Stationary Markov Decision Processes, a Worst-Case Approach Using Model-Based Reinforcement Learning." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/lecarpentier2019neurips-nonstationary/)

BibTeX

@inproceedings{lecarpentier2019neurips-nonstationary,
  title     = {{Non-Stationary Markov Decision Processes, a Worst-Case Approach Using Model-Based Reinforcement Learning}},
  author    = {Lecarpentier, Erwan and Rachelson, Emmanuel},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {7216-7225},
  url       = {https://mlanthology.org/neurips/2019/lecarpentier2019neurips-nonstationary/}
}