Model-Based Hierarchical Average-Reward Reinforcement Learning

Abstract

There is a growing interest in using task hierarchies to tame the complexity of reinforcement learning. In this paper, we extend the MAXQ framework to hierarchical Average-reward Reinforcement Learning. We motivate and introduce the notions of recursive gain-optimality and hierarchical gainoptimality and show that these two coincide when a condition called Result Distribution Invariance holds. We present two model-based algorithms, HH-learning and HAH-learning, that use a predefined task hierarchy and abstraction functions, and learn recursively gain-optimal policies. HH-learning can be used with any exploration policy, whereas HAH-learning has a built-in "auto-exploratory" feature. We present empirical results that show that HH-learning converges in fewer steps than the corresponding "flat" algorithm and scales better with certain aspects of the domain size. The results on HAH-learning also show that it is more effective in exploring the state space than HH-learning using epsilon-greedy exploration.

Cite

Text

Seri and Tadepalli. "Model-Based Hierarchical Average-Reward Reinforcement Learning." International Conference on Machine Learning, 2002.

Markdown

[Seri and Tadepalli. "Model-Based Hierarchical Average-Reward Reinforcement Learning." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/seri2002icml-model/)

BibTeX

@inproceedings{seri2002icml-model,
  title     = {{Model-Based Hierarchical Average-Reward Reinforcement Learning}},
  author    = {Seri, Sandeep and Tadepalli, Prasad},
  booktitle = {International Conference on Machine Learning},
  year      = {2002},
  pages     = {562-569},
  url       = {https://mlanthology.org/icml/2002/seri2002icml-model/}
}