Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
Abstract
Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.
Cite
Text
Zhao et al. "Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration." International Conference on Learning Representations, 2025.Markdown
[Zhao et al. "Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/zhao2025iclr-beyondexpert/)BibTeX
@inproceedings{zhao2025iclr-beyondexpert,
title = {{Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration}},
author = {Zhao, Heyang and Yu, Xingrui and Bossens, David Mark and Tsang, Ivor and Gu, Quanquan},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/zhao2025iclr-beyondexpert/}
}