Optimal Transport for Offline Imitation Learning

Abstract

With the advent of large datasets, offline reinforcement learning is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Relabeling (OTR), an imitation learning algorithm that can automatically relabel offline data of mixed and unknown quality with rewards from a few good demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we demonstrate that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.

Cite

Text

Luo et al. "Optimal Transport for Offline Imitation Learning." International Conference on Learning Representations, 2023.

Markdown

[Luo et al. "Optimal Transport for Offline Imitation Learning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/luo2023iclr-optimal/)

BibTeX

@inproceedings{luo2023iclr-optimal,
  title     = {{Optimal Transport for Offline Imitation Learning}},
  author    = {Luo, Yicheng and Jiang, Zhengyao and Cohen, Samuel and Grefenstette, Edward and Deisenroth, Marc Peter},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/luo2023iclr-optimal/}
}