Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Abstract

In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.

Cite

Text

Ainsworth et al. "Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation." Neural Information Processing Systems, 2019.

Markdown

[Ainsworth et al. "Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/ainsworth2019neurips-mo/)

BibTeX

@inproceedings{ainsworth2019neurips-mo,
  title     = {{Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation}},
  author    = {Ainsworth, Samuel and Barnes, Matt and Srinivasa, Siddhartha},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {15182-15192},
  url       = {https://mlanthology.org/neurips/2019/ainsworth2019neurips-mo/}
}