Hierarchical Average Reward Reinforcement Learning

Abstract

Hierarchical reinforcement learning (HRL) is a general framework for scaling reinforcement learning (RL) to problems with large state and action spaces by using the task (or action) structure to restrict the space of policies. Prior work in HRL including HAMs, options, MAXQ, and PHAMs has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. The average reward optimality criterion has been recognized to be more appropriate for a wide class of continuing tasks than the discounted framework. Although average reward RL has been studied for decades, prior work has been largely limited to flat policy representations.

Cite

Text

Ghavamzadeh and Mahadevan. "Hierarchical Average Reward Reinforcement Learning." Journal of Machine Learning Research, 2007.

Markdown

[Ghavamzadeh and Mahadevan. "Hierarchical Average Reward Reinforcement Learning." Journal of Machine Learning Research, 2007.](https://mlanthology.org/jmlr/2007/ghavamzadeh2007jmlr-hierarchical/)

BibTeX

@article{ghavamzadeh2007jmlr-hierarchical,
  title     = {{Hierarchical Average Reward Reinforcement Learning}},
  author    = {Ghavamzadeh, Mohammad and Mahadevan, Sridhar},
  journal   = {Journal of Machine Learning Research},
  year      = {2007},
  pages     = {2629-2669},
  volume    = {8},
  url       = {https://mlanthology.org/jmlr/2007/ghavamzadeh2007jmlr-hierarchical/}
}