Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies
Abstract
In this paper we present a hybrid system combining techniques from symbolic planning and reinforcement learning. Planning is used to automatically construct task hierarchies for hierarchical models of the behaviours ’ purpose, and to perform intelligent termination improvement when an executing behaviour is no longer appropriate. Reinforcement learning is used to produce concrete implementations of abstractly defined behaviours and to learn the best possible choice of behaviour when plans are ambiguous. Two new hierarchical reinforcement learning algorithms are presented: Planned Hierarchical Semi-Markov Q-Learning (P-HSMQ), a variant of the HSMQ algorithm (Dietterich, 2000b) which uses plan-built task hierarchies, and Teleo-Reactive Q-Learning (TRQ) a more complex algorithm which implements hierarchical reinforcement learning with teleo-reactive execution semantics (Nilsson, 1994). Each algorithm is demonstrated in a simple grid-world domain. 1.
Cite
Text
Ryan. "Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies." International Conference on Machine Learning, 2002.Markdown
[Ryan. "Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/ryan2002icml-using/)BibTeX
@inproceedings{ryan2002icml-using,
title = {{Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies}},
author = {Ryan, Malcolm R. K.},
booktitle = {International Conference on Machine Learning},
year = {2002},
pages = {522-529},
url = {https://mlanthology.org/icml/2002/ryan2002icml-using/}
}