Learning from Trajectories via Subgoal Discovery

Abstract

Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples. In such cases, using a set of expert trajectories could help to learn faster. However, Imitation Learning (IL) via supervised pre-training with these trajectories may not perform as well and generally requires additional finetuning with expert-in-the-loop. In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals. We learn a function which partitions the state-space into sub-goals, which can then be used to design an extrinsic reward function. We follow a strategy where the agent first learns from the trajectories using IL and then switches to Reinforcement Learning (RL) using the identified sub-goals, to alleviate the errors in the IL step. To deal with states which are under-represented by the trajectory set, we also learn a function to modulate the sub-goal predictions. We show that our method is able to solve complex goal-oriented tasks, which other RL, IL or their combinations in literature are not able to solve.

Cite

Text

Paul et al. "Learning from Trajectories via Subgoal Discovery." Neural Information Processing Systems, 2019.

Markdown

[Paul et al. "Learning from Trajectories via Subgoal Discovery." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/paul2019neurips-learning/)

BibTeX

@inproceedings{paul2019neurips-learning,
  title     = {{Learning from Trajectories via Subgoal Discovery}},
  author    = {Paul, Sujoy and Vanbaar, Jeroen and Roy-Chowdhury, Amit},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {8411-8421},
  url       = {https://mlanthology.org/neurips/2019/paul2019neurips-learning/}
}