Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Gupta, Abhishek; Kumar, Vikash; Lynch, Corey; Levine, Sergey; Hausman, Karol

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, Karol Hausman

CoRL 2019 pp. 1025-1037

/corl/2019/gupta2019corl-relay/

Abstract

We present relay policy learning, a method for imitation and reinforcement learning that can solve multi-stage, long-horizon robotic tasks. This general and universally-applicable, two-phase approach consists of an imitation learning stage resulting in goal-conditioned hierarchical policies that can be easily improved using fine-tuning via reinforcement learning in the subsequent phase. Our method, while not necessarily perfect at imitation learning, is very amenable to further improvement via environment interaction allowing it to scale to challenging long-horizon tasks. In particular, we simplify the long-horizon policy learning problem by using a novel data-relabeling algorithm for learning goal-conditioned hierarchical policies, where the low-level only acts for a fixed number of steps, regardless of the goal achieved. While we rely on demonstration data to bootstrap policy learning, we do not assume access to demonstrations of specific tasks. Instead, our approach can leverage unstructured and unsegmented demonstrations of semantically meaningful behaviors that are not only less burdensome to provide, but also can greatly facilitate further improvement using reinforcement learning. We demonstrate the effectiveness of our method on a number of multi-stage, long-horizon manipulation tasks in a challenging kitchen simulation environment.

PDF CoRL Semantic Scholar

Cite

Text

Gupta et al. "Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning." Conference on Robot Learning, 2019.

Markdown

[Gupta et al. "Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning." Conference on Robot Learning, 2019.](https://mlanthology.org/corl/2019/gupta2019corl-relay/)

BibTeX

@inproceedings{gupta2019corl-relay,
  title     = {{Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning}},
  author    = {Gupta, Abhishek and Kumar, Vikash and Lynch, Corey and Levine, Sergey and Hausman, Karol},
  booktitle = {Conference on Robot Learning},
  year      = {2019},
  pages     = {1025-1037},
  volume    = {100},
  url       = {https://mlanthology.org/corl/2019/gupta2019corl-relay/}
}