Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks
Abstract
Reinforcement learning (RL) has achieved considerable success in many fields, but applying it to real-world problems can be costly and risky because it requires a lot of online interaction. Recently, offline RL has shown the possibility of extracting a solution through existing logged data without online interaction. In this work, we propose an offline hierarchical RL method, Guider (Guide to Control), that can efficiently solve long-horizon and sparse-reward tasks from offline data. The high-level policy sequentially generates a subgoal that can guide the agent to arrive at the final goal, and the lower-level policy learns how to reach each given guided subgoal. In the process of learning from offline data, the key is to make the low-level policy reachable to the generated subgoals. We show that high-quality subgoal generation is possible through pre-training a latent subgoal prior model. The well-regulated subgoal generation improves performance while avoiding distributional shifts in offline RL by breaking down long, complex tasks into shorter, easier ones. For evaluations, Guider outperforms prior offline RL methods in long-horizon robot navigation and complex manipulation benchmarks. Our code is available at https://github.com/gckor/Guider.
Cite
Text
Shin and Kim. "Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/469Markdown
[Shin and Kim. "Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/shin2023ijcai-guide/) doi:10.24963/IJCAI.2023/469BibTeX
@inproceedings{shin2023ijcai-guide,
title = {{Guide to Control: Offline Hierarchical Reinforcement Learning Using Subgoal Generation for Long-Horizon and Sparse-Reward Tasks}},
author = {Shin, Wonchul and Kim, Yusung},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {4217-4225},
doi = {10.24963/IJCAI.2023/469},
url = {https://mlanthology.org/ijcai/2023/shin2023ijcai-guide/}
}