I²HRL: Interactive Influence-Based Hierarchical Reinforcement Learning
Abstract
Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I^2HRL). First, inspired by agent modeling, we enable the interaction between the low-level and high-level policies to stabilize the high-level policy training. The high-level policy makes decisions conditioned on the received low-level policy representation as well as the state of the environment. Second, we furthermore stabilize the high-level policy via an information-theoretic regularization with minimal dependence on the changing low-level policy. Third, we propose the influence-based exploration to more frequently visit the non-stationary states where more transition data is needed. We experimentally validate the effectiveness of the proposed solution in several tasks in MuJoCo domains by demonstrating that our approach can significantly boost the learning performance and accelerate learning compared with state-of-the-art HRL methods.
Cite
Text
Wang et al. "I²HRL: Interactive Influence-Based Hierarchical Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/433Markdown
[Wang et al. "I²HRL: Interactive Influence-Based Hierarchical Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/wang2020ijcai-i/) doi:10.24963/IJCAI.2020/433BibTeX
@inproceedings{wang2020ijcai-i,
title = {{I²HRL: Interactive Influence-Based Hierarchical Reinforcement Learning}},
author = {Wang, Rundong and Yu, Runsheng and An, Bo and Rabinovich, Zinovi},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {3131-3138},
doi = {10.24963/IJCAI.2020/433},
url = {https://mlanthology.org/ijcai/2020/wang2020ijcai-i/}
}