Sim-to-Lab-to-Real: Safe RL with Shielding and Generalization Guarantees

Abstract

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the safety Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot (See https://tinyurl.com/2p9hbyf7 for video of representative trials of Real deployment).

Cite

Text

Hsu et al. "Sim-to-Lab-to-Real: Safe RL with Shielding and Generalization Guarantees." ICLR 2022 Workshops: GPL, 2022.

Markdown

[Hsu et al. "Sim-to-Lab-to-Real: Safe RL with Shielding and Generalization Guarantees." ICLR 2022 Workshops: GPL, 2022.](https://mlanthology.org/iclrw/2022/hsu2022iclrw-simtolabtoreal/)

BibTeX

@inproceedings{hsu2022iclrw-simtolabtoreal,
  title     = {{Sim-to-Lab-to-Real: Safe RL with Shielding and Generalization Guarantees}},
  author    = {Hsu, Kai-Chieh and Ren, Allen Z. and Nguyen, Duy Phuong and Majumdar, Anirudha and Fisac, Jaime Fernández},
  booktitle = {ICLR 2022 Workshops: GPL},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/hsu2022iclrw-simtolabtoreal/}
}