Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes

Abstract

We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.

Cite

Text

Wachi et al. "Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.12103

Markdown

[Wachi et al. "Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/wachi2018aaai-safe/) doi:10.1609/AAAI.V32I1.12103

BibTeX

@inproceedings{wachi2018aaai-safe,
  title     = {{Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes}},
  author    = {Wachi, Akifumi and Sui, Yanan and Yue, Yisong and Ono, Masahiro},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {6548-6556},
  doi       = {10.1609/AAAI.V32I1.12103},
  url       = {https://mlanthology.org/aaai/2018/wachi2018aaai-safe/}
}