Model-Based Hierarchical Average-Reward Reinforcement Learning
Abstract
There is a growing interest in using task hierarchies to tame the complexity of reinforcement learning. In this paper, we extend the MAXQ framework to hierarchical Average-reward Reinforcement Learning. We motivate and introduce the notions of recursive gain-optimality and hierarchical gainoptimality and show that these two coincide when a condition called Result Distribution Invariance holds. We present two model-based algorithms, HH-learning and HAH-learning, that use a predefined task hierarchy and abstraction functions, and learn recursively gain-optimal policies. HH-learning can be used with any exploration policy, whereas HAH-learning has a built-in "auto-exploratory" feature. We present empirical results that show that HH-learning converges in fewer steps than the corresponding "flat" algorithm and scales better with certain aspects of the domain size. The results on HAH-learning also show that it is more effective in exploring the state space than HH-learning using epsilon-greedy exploration.
Cite
Text
Seri and Tadepalli. "Model-Based Hierarchical Average-Reward Reinforcement Learning." International Conference on Machine Learning, 2002.Markdown
[Seri and Tadepalli. "Model-Based Hierarchical Average-Reward Reinforcement Learning." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/seri2002icml-model/)BibTeX
@inproceedings{seri2002icml-model,
title = {{Model-Based Hierarchical Average-Reward Reinforcement Learning}},
author = {Seri, Sandeep and Tadepalli, Prasad},
booktitle = {International Conference on Machine Learning},
year = {2002},
pages = {562-569},
url = {https://mlanthology.org/icml/2002/seri2002icml-model/}
}