Imitating Graph-Based Planning with Goal-Conditioned Policies
Abstract
Recently, graph-based planning algorithms have gained much attention to solve goal-conditioned reinforcement learning (RL) tasks: they provide a sequence of subgoals to reach the target-goal, and the agents learn to execute subgoal-conditioned policies. However, the sample-efficiency of such RL schemes still remains a challenge, particularly for long-horizon tasks. To address this issue, we present a simple yet effective self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. Our intuition here is that to reach a target-goal, an agent should pass through a subgoal, so target-goal- and subgoal- conditioned policies should be similar to each other. We also propose a novel scheme of stochastically skipping executed subgoals in a planned path, which further improves performance. Unlike prior methods that only utilize graph-based planning in an execution phase, our method transfers knowledge from a planner along with a graph into policy learning. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods under various long-horizon control tasks.
Cite
Text
Kim et al. "Imitating Graph-Based Planning with Goal-Conditioned Policies." International Conference on Learning Representations, 2023.Markdown
[Kim et al. "Imitating Graph-Based Planning with Goal-Conditioned Policies." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/kim2023iclr-imitating/)BibTeX
@inproceedings{kim2023iclr-imitating,
title = {{Imitating Graph-Based Planning with Goal-Conditioned Policies}},
author = {Kim, Junsu and Seo, Younggyo and Ahn, Sungsoo and Son, Kyunghwan and Shin, Jinwoo},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/kim2023iclr-imitating/}
}