One-Shot Imitation with Skill Chaining Using a Goal-Conditioned Policy in Long-Horizon Control
Abstract
Recent advances in skill learning from a task-agnostic offline dataset enable the agent to acquire various skills that can be used as primitives to perform long-horizon imitation. However, most work implicitly assumes that the offline dataset covers the entire distribution of target demonstrations. If the dataset only contains subtask-local trajectories, existing methods fail to imitate the transitions between subtasks without a sufficient amount of target demonstrations, significantly limiting the scalability of these methods. In this work, we show that a simple goal-conditioned policy can imitate the missing transitions using only the target demonstrations. We combine it with a policy-switching strategy that uses the skills when they are applicable. Furthermore, we present multiple choices that can effectively evaluate the applicability of skills. Our new method successfully performs one-shot imitation with skills learned from a subtask-local offline dataset. We experimentally show that it outperforms other one-shot imitation methods in a challenging kitchen environment, and we also qualitatively analyze how each policy-switching strategy works during imitation.
Cite
Text
Watahiki and Tsuruoka. "One-Shot Imitation with Skill Chaining Using a Goal-Conditioned Policy in Long-Horizon Control." ICLR 2022 Workshops: GPL, 2022.Markdown
[Watahiki and Tsuruoka. "One-Shot Imitation with Skill Chaining Using a Goal-Conditioned Policy in Long-Horizon Control." ICLR 2022 Workshops: GPL, 2022.](https://mlanthology.org/iclrw/2022/watahiki2022iclrw-oneshot/)BibTeX
@inproceedings{watahiki2022iclrw-oneshot,
title = {{One-Shot Imitation with Skill Chaining Using a Goal-Conditioned Policy in Long-Horizon Control}},
author = {Watahiki, Hayato and Tsuruoka, Yoshimasa},
booktitle = {ICLR 2022 Workshops: GPL},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/watahiki2022iclrw-oneshot/}
}