Continuous-Time Hierarchical Reinforcement Learning

Abstract

Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work in hierarchical RL, such as the MAXQ method, has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. This paper generalizes the MAXQ method to continuous-time discounted and average reward SMDP models. We describe two hierarchical reinforcement learning algorithms: continuous-time discounted reward MAXQ and continuous-time average reward MAXQ. We apply these algorithms to a complex multiagent AGV scheduling problem, and compare their performance and speed with each other, as well as several well-known AGV scheduling heuristics. 1.

Cite

Text

Ghavamzadeh and Mahadevan. "Continuous-Time Hierarchical Reinforcement Learning." International Conference on Machine Learning, 2001.

Markdown

[Ghavamzadeh and Mahadevan. "Continuous-Time Hierarchical Reinforcement Learning." International Conference on Machine Learning, 2001.](https://mlanthology.org/icml/2001/ghavamzadeh2001icml-continuous/)

BibTeX

@inproceedings{ghavamzadeh2001icml-continuous,
  title     = {{Continuous-Time Hierarchical Reinforcement Learning}},
  author    = {Ghavamzadeh, Mohammad and Mahadevan, Sridhar},
  booktitle = {International Conference on Machine Learning},
  year      = {2001},
  pages     = {186-193},
  url       = {https://mlanthology.org/icml/2001/ghavamzadeh2001icml-continuous/}
}