LTLf/LDLf Non-Markovian Rewards

Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi

AAAI 2018 pp. 1771-1778

doi:10.1609/AAAI.V32I1.11572 /aaai/2018/brafman2018aaai-ltlf/

Abstract

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

PDF AAAI Semantic Scholar

Cite

Text

Brafman et al. "LTLf/LDLf Non-Markovian Rewards." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11572

Markdown

[Brafman et al. "LTLf/LDLf Non-Markovian Rewards." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/brafman2018aaai-ltlf/) doi:10.1609/AAAI.V32I1.11572

BibTeX

@inproceedings{brafman2018aaai-ltlf,
  title     = {{LTLf/LDLf Non-Markovian Rewards}},
  author    = {Brafman, Ronen I. and De Giacomo, Giuseppe and Patrizi, Fabio},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {1771-1778},
  doi       = {10.1609/AAAI.V32I1.11572},
  url       = {https://mlanthology.org/aaai/2018/brafman2018aaai-ltlf/}
}