Learning and Planning with Timing Information in Markov Decision Processes

Abstract

We consider the problem of learning and planning in Markov decision processes with temporally extended actions represented in the options framework. We propose to use predictions about the duration of extended actions to represent the state and show that this leads to a compact predictive state representation model independent of the set of primitive actions. Then we develop a consistent and efficient spectral learning algorithm for such models. Using just the timing information to represent states allows for faster improvement in the planning performance. We illustrate our approach with experiments in both synthetic and robot navigation domains.

Cite

Text

Bacon et al. "Learning and Planning with Timing Information in Markov Decision Processes." Conference on Uncertainty in Artificial Intelligence, 2015.

Markdown

[Bacon et al. "Learning and Planning with Timing Information in Markov Decision Processes." Conference on Uncertainty in Artificial Intelligence, 2015.](https://mlanthology.org/uai/2015/bacon2015uai-learning/)

BibTeX

@inproceedings{bacon2015uai-learning,
  title     = {{Learning and Planning with Timing Information in Markov Decision Processes}},
  author    = {Bacon, Pierre-Luc and Balle, Borja and Precup, Doina},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2015},
  pages     = {111-120},
  url       = {https://mlanthology.org/uai/2015/bacon2015uai-learning/}
}