Diverse Offline Imitation Learning

Abstract

There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system.

Cite

Text

Vlastelica et al. "Diverse Offline Imitation Learning." NeurIPS 2023 Workshops: ALOE, 2023.

Markdown

[Vlastelica et al. "Diverse Offline Imitation Learning." NeurIPS 2023 Workshops: ALOE, 2023.](https://mlanthology.org/neuripsw/2023/vlastelica2023neuripsw-diverse/)

BibTeX

@inproceedings{vlastelica2023neuripsw-diverse,
  title     = {{Diverse Offline Imitation Learning}},
  author    = {Vlastelica, Marin and Cheng, Jin and Martius, Georg and Kolev, Pavel},
  booktitle = {NeurIPS 2023 Workshops: ALOE},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/vlastelica2023neuripsw-diverse/}
}