Versatile Offline Imitation Learning via State-Occupancy Matching

Abstract

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile algorithm for offline imitation learning (IL) via state-occupancy matching. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality, reducing a nested optimization problem to a sequence of stable supervised learning problems. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

Cite

Text

Ma et al. "Versatile Offline Imitation Learning via State-Occupancy Matching." ICLR 2022 Workshops: GPL, 2022.

Markdown

[Ma et al. "Versatile Offline Imitation Learning via State-Occupancy Matching." ICLR 2022 Workshops: GPL, 2022.](https://mlanthology.org/iclrw/2022/ma2022iclrw-versatile/)

BibTeX

@inproceedings{ma2022iclrw-versatile,
  title     = {{Versatile Offline Imitation Learning via State-Occupancy Matching}},
  author    = {Ma, Yecheng Jason and Shen, Andrew and Jayaraman, Dinesh and Bastani, Osbert},
  booktitle = {ICLR 2022 Workshops: GPL},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/ma2022iclrw-versatile/}
}