Efficient Inverse Reinforcement Learning Without Compounding Errors

Abstract

Inverse reinforcement learning (IRL) is an on-policy approach to imitation learning (IL) that allows the learner to observe the consequences of their actions at train-time. Accordingly, there are two seemingly contradictory desiderata for IRL algorithms: (a) preventing the compounding errors that stymie offline approaches like behavioral cloning and (b) avoiding the worst-case exploration complexity of reinforcement learning (RL). Prior work has been able to achieve either (a) or (b) but not both simultaneously. In our work, we first prove a negative result showing that, without further assumptions, there are no efficient IRL algorithms that learn a competitive policy in the worst case. We then provide a positive result: under a novel structural condition we term reward-agnostic policy completeness, we prove that efficient IRL algorithms do avoid compounding errors, giving us the best of both worlds. We also propose a principled method for using sub-optimal data to further improve the sample-efficiency of efficient IRL algorithms.

Cite

Text

Dice et al. "Efficient Inverse Reinforcement Learning Without Compounding Errors." ICML 2024 Workshops: MFHAIA, 2024.

Markdown

[Dice et al. "Efficient Inverse Reinforcement Learning Without Compounding Errors." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/dice2024icmlw-efficient/)

BibTeX

@inproceedings{dice2024icmlw-efficient,
  title     = {{Efficient Inverse Reinforcement Learning Without Compounding Errors}},
  author    = {Dice, Nicolas Espinosa and Swamy, Gokul and Choudhury, Sanjiban and Sun, Wen},
  booktitle = {ICML 2024 Workshops: MFHAIA},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/dice2024icmlw-efficient/}
}