The Expected-Length Model of Options
Abstract
Effective options can make reinforcement learning easier by enhancing an agent's ability to both explore in a targeted manner and plan further into the future. However, learning an appropriate model of an option's dynamics in hard, requiring estimating a highly parameterized probability distribution. This paper introduces and motivates the Expected-Length Model (ELM) for options, an alternate model for transition dynamics. We prove ELM is a (biased) estimator of the traditional Multi-Time Model (MTM), but provide a non-vacuous bound on their deviation. We further prove that, in stochastic shortest path problems, ELM induces a value function that is sufficiently similar to the one induced by MTM, and is thus capable of supporting near-optimal behavior. We explore the practical utility of this option model experimentally, finding consistent support for the thesis that ELM is a suitable replacement for MTM. In some cases, we find ELM leads to more sample efficient learning, especially when options are arranged in a hierarchy.
Cite
Text
Abel et al. "The Expected-Length Model of Options." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/270Markdown
[Abel et al. "The Expected-Length Model of Options." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/abel2019ijcai-expected/) doi:10.24963/IJCAI.2019/270BibTeX
@inproceedings{abel2019ijcai-expected,
title = {{The Expected-Length Model of Options}},
author = {Abel, David and Winder, John and desJardins, Marie and Littman, Michael L.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2019},
pages = {1951-1958},
doi = {10.24963/IJCAI.2019/270},
url = {https://mlanthology.org/ijcai/2019/abel2019ijcai-expected/}
}