Potential Based Reward Shaping for Hierarchical Reinforcement Learning
Abstract
Hierarchical Reinforcement Learning (HRL) outperforms many ‘flat’ Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRS-MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly outperforms MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.
Cite
Text
Gao and Toni. "Potential Based Reward Shaping for Hierarchical Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2015.Markdown
[Gao and Toni. "Potential Based Reward Shaping for Hierarchical Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2015.](https://mlanthology.org/ijcai/2015/gao2015ijcai-potential/)BibTeX
@inproceedings{gao2015ijcai-potential,
title = {{Potential Based Reward Shaping for Hierarchical Reinforcement Learning}},
author = {Gao, Yang and Toni, Francesca},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2015},
pages = {3504-3510},
url = {https://mlanthology.org/ijcai/2015/gao2015ijcai-potential/}
}