Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes

Akifumi Wachi, Yanan Sui, Yisong Yue, Masahiro Ono

AAAI 2018 pp. 6548-6556

doi:10.1609/AAAI.V32I1.12103 /aaai/2018/wachi2018aaai-safe/

Abstract

We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.

PDF AAAI Semantic Scholar

Cite

Text

Wachi et al. "Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.12103

Markdown

[Wachi et al. "Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/wachi2018aaai-safe/) doi:10.1609/AAAI.V32I1.12103

BibTeX

@inproceedings{wachi2018aaai-safe,
  title     = {{Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes}},
  author    = {Wachi, Akifumi and Sui, Yanan and Yue, Yisong and Ono, Masahiro},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {6548-6556},
  doi       = {10.1609/AAAI.V32I1.12103},
  url       = {https://mlanthology.org/aaai/2018/wachi2018aaai-safe/}
}