A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Perez, Mateo; Somenzi, Fabio; Trivedi, Ashutosh

doi:10.1609/AAAI.V38I19.30148

A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Mateo Perez, Fabio Somenzi, Ashutosh Trivedi

AAAI 2024 pp. 21510-21517

doi:10.1609/AAAI.V38I19.30148 /aaai/2024/perez2024aaai-pac/

Abstract

Linear temporal logic (LTL) and omega-regular objectives---a superset of LTL---have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

PDF AAAI Semantic Scholar

Cite

Text

Perez et al. "A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30148

Markdown

[Perez et al. "A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/perez2024aaai-pac/) doi:10.1609/AAAI.V38I19.30148

BibTeX

@inproceedings{perez2024aaai-pac,
  title     = {{A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs}},
  author    = {Perez, Mateo and Somenzi, Fabio and Trivedi, Ashutosh},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {21510-21517},
  doi       = {10.1609/AAAI.V38I19.30148},
  url       = {https://mlanthology.org/aaai/2024/perez2024aaai-pac/}
}