A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Abstract

Linear temporal logic (LTL) and omega-regular objectives---a superset of LTL---have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.

Cite

Text

Perez et al. "A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30148

Markdown

[Perez et al. "A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/perez2024aaai-pac/) doi:10.1609/AAAI.V38I19.30148

BibTeX

@inproceedings{perez2024aaai-pac,
  title     = {{A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs}},
  author    = {Perez, Mateo and Somenzi, Fabio and Trivedi, Ashutosh},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {21510-21517},
  doi       = {10.1609/AAAI.V38I19.30148},
  url       = {https://mlanthology.org/aaai/2024/perez2024aaai-pac/}
}