Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Zikelic, Dorde; Lechner, Mathias; Henzinger, Thomas A.; Chatterjee, Krishnendu

doi:10.1609/AAAI.V37I10.26407

Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Dorde Zikelic, Mathias Lechner, Thomas A. Henzinger, Krishnendu Chatterjee

AAAI 2023 pp. 11926-11935

doi:10.1609/AAAI.V37I10.26407 /aaai/2023/zikelic2023aaai-learning/

Abstract

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.

PDF AAAI Semantic Scholar

Cite

Text

Zikelic et al. "Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I10.26407

Markdown

[Zikelic et al. "Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/zikelic2023aaai-learning/) doi:10.1609/AAAI.V37I10.26407

BibTeX

@inproceedings{zikelic2023aaai-learning,
  title     = {{Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees}},
  author    = {Zikelic, Dorde and Lechner, Mathias and Henzinger, Thomas A. and Chatterjee, Krishnendu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {11926-11935},
  doi       = {10.1609/AAAI.V37I10.26407},
  url       = {https://mlanthology.org/aaai/2023/zikelic2023aaai-learning/}
}