Zero-Shot Offline Imitation Learning via Optimal Transport

Abstract

Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent’s immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks. The code is available at https://github.com/martius-lab/zilot.

Cite

Text

Rupf et al. "Zero-Shot Offline Imitation Learning via Optimal Transport." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Rupf et al. "Zero-Shot Offline Imitation Learning via Optimal Transport." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/rupf2025icml-zeroshot/)

BibTeX

@inproceedings{rupf2025icml-zeroshot,
  title     = {{Zero-Shot Offline Imitation Learning via Optimal Transport}},
  author    = {Rupf, Thomas and Bagatella, Marco and Gürtler, Nico and Frey, Jonas and Martius, Georg},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {52345-52381},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/rupf2025icml-zeroshot/}
}