Imitating Graph-Based Planning with Goal-Conditioned Policies

Abstract

Recently, graph-based planning algorithms have gained much attention to solve goal-conditioned reinforcement learning (RL) tasks: they provide a sequence of subgoals to reach the target-goal, and the agents learn to execute subgoal-conditioned policies. However, the sample-efficiency of such RL schemes still remains a challenge, particularly for long-horizon tasks. To address this issue, we present a simple yet effective self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. Our intuition here is that to reach a target-goal, an agent should pass through a subgoal, so target-goal- and subgoal- conditioned policies should be similar to each other. We also propose a novel scheme of stochastically skipping executed subgoals in a planned path, which further improves performance. Unlike prior methods that only utilize graph-based planning in an execution phase, our method transfers knowledge from a planner along with a graph into policy learning. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods under various long-horizon control tasks.

Cite

Text

Kim et al. "Imitating Graph-Based Planning with Goal-Conditioned Policies." International Conference on Learning Representations, 2023.

Markdown

[Kim et al. "Imitating Graph-Based Planning with Goal-Conditioned Policies." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/kim2023iclr-imitating/)

BibTeX

@inproceedings{kim2023iclr-imitating,
  title     = {{Imitating Graph-Based Planning with Goal-Conditioned Policies}},
  author    = {Kim, Junsu and Seo, Younggyo and Ahn, Sungsoo and Son, Kyunghwan and Shin, Jinwoo},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/kim2023iclr-imitating/}
}