Imitation Learning as Return Distribution Matching

Lazzati, Filippo; Metelli, Alberto Maria

Imitation Learning as Return Distribution Matching

ICLR 2026

/iclr/2026/lazzati2026iclr-imitation/

Abstract

We study the problem of training a risk-sensitive reinforcement learning (RL) agent through imitation learning (IL). Unlike standard IL, our goal is not only to train an agent that matches the expert’s expected return (i.e., its *average performance*) but also its *risk attitude* (i.e., other features of the return distribution, such as variance). We propose a general formulation of the risk-sensitive IL problem in which the objective is to match the expert’s return distribution in Wasserstein distance. We focus on the tabular setting and assume the expert’s reward is *known*. After demonstrating the limited expressivity of Markovian policies for this task, we introduce an efficient and sufficiently expressive subclass of non-Markovian policies tailored to it. Building on this subclass, we develop two provably efficient algorithms—RS-BC and RS-KT —for solving the problem when the transition model is unknown and known, respectively. We show that RS-KT achieves substantially lower sample complexity than RS-BC by exploiting dynamics information. We further demonstrate the sample efficiency of return distribution matching in the setting where the expert’s reward is *unknown* by designing an oracle-based variant of RS-KT. Finally, we complement our theoretical analysis of RS-KT and RS-BC with numerical simulations, highlighting both their sample efficiency and the advantages of non-Markovian policies over standard sample-efficient IL algorithms.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Lazzati and Metelli. "Imitation Learning as Return Distribution Matching." International Conference on Learning Representations, 2026.

Markdown

[Lazzati and Metelli. "Imitation Learning as Return Distribution Matching." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lazzati2026iclr-imitation/)

BibTeX

@inproceedings{lazzati2026iclr-imitation,
  title     = {{Imitation Learning as Return Distribution Matching}},
  author    = {Lazzati, Filippo and Metelli, Alberto Maria},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lazzati2026iclr-imitation/}
}