Provably Safe PAC-MDP Exploration Using Analogies

Abstract

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.

Cite

Text

Roderick et al. "Provably Safe PAC-MDP Exploration Using Analogies." Artificial Intelligence and Statistics, 2021.

Markdown

[Roderick et al. "Provably Safe PAC-MDP Exploration Using Analogies." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/roderick2021aistats-provably/)

BibTeX

@inproceedings{roderick2021aistats-provably,
  title     = {{Provably Safe PAC-MDP Exploration Using Analogies}},
  author    = {Roderick, Melrose and Nagarajan, Vaishnavh and Kolter, Zico},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2021},
  pages     = {1216-1224},
  volume    = {130},
  url       = {https://mlanthology.org/aistats/2021/roderick2021aistats-provably/}
}