Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

Heyang Zhao, Xingrui Yu, David Mark Bossens, Ivor Tsang, Quanquan Gu

ICLR 2025

/iclr/2025/zhao2025iclr-beyondexpert/

Abstract

Imitation learning is a central problem in reinforcement learning where the goal is to learn a policy that mimics the expert's behavior. In practice, it is often challenging to learn the expert policy from a limited number of demonstrations accurately due to the complexity of the state space. Moreover, it is essential to explore the environment and collect data to achieve beyond-expert performance. To overcome these challenges, we propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE), which implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance. Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work. We also provide a theoretical justification of ILDE as an uncertainty-regularized policy optimization method with optimistic exploration, leading to a regret growing sublinearly in the number of episodes.

PDF ICLR Semantic Scholar

Cite

Text

Zhao et al. "Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration." International Conference on Learning Representations, 2025.

Markdown

[Zhao et al. "Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/zhao2025iclr-beyondexpert/)

BibTeX

@inproceedings{zhao2025iclr-beyondexpert,
  title     = {{Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration}},
  author    = {Zhao, Heyang and Yu, Xingrui and Bossens, David Mark and Tsang, Ivor and Gu, Quanquan},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/zhao2025iclr-beyondexpert/}
}