Towards Safe Policy Improvement for Non-Stationary MDPs

Abstract

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy’s forecasted performance, and confidence intervals are obtained using wild bootstrap.

Cite

Text

Chandak et al. "Towards Safe Policy Improvement for Non-Stationary MDPs." Neural Information Processing Systems, 2020.

Markdown

[Chandak et al. "Towards Safe Policy Improvement for Non-Stationary MDPs." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/chandak2020neurips-safe/)

BibTeX

@inproceedings{chandak2020neurips-safe,
  title     = {{Towards Safe Policy Improvement for Non-Stationary MDPs}},
  author    = {Chandak, Yash and Jordan, Scott and Theocharous, Georgios and White, Martha and Thomas, Philip S.},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/chandak2020neurips-safe/}
}