Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning

Abstract

Imitation learning (IL) has recently shown impressive performance in training a reinforcement learning agent with human demonstrations, eliminating the difficulty of designing elaborate reward functions in complex environments. However, most IL methods work under the assumption of the optimality of the demonstrations and thus cannot learn policies to surpass the demonstrators. Some methods have been investigated to obtain better-than-demonstration (BD) performance with inner human feedback or preference labels. In this paper, we propose a method to learn rewards from suboptimal demonstrations via a weighted preference learning technique (LERP). Specifically, we first formulate the suboptimality of demonstrations as the inaccurate estimation of rewards. The inaccuracy is modeled with a reward noise random variable following the Gumbel distribution. Moreover, we derive an upper bound of the expected return with different noise coefficients and propose a theorem to surpass the demonstrations. Unlike existing literature, our analysis does not depend on the linear reward constraint. Consequently, we develop a BD model with a weighted preference learning technique. Experimental results on continuous control and high-dimensional discrete control tasks show the superiority of our LERP method over other state-of-the-art BD methods.

Cite

Text

Huo et al. "Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I7.25962

Markdown

[Huo et al. "Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/huo2023aaai-learning/) doi:10.1609/AAAI.V37I7.25962

BibTeX

@inproceedings{huo2023aaai-learning,
  title     = {{Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning}},
  author    = {Huo, Liangyu and Wang, Zulin and Xu, Mai},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {7953-7961},
  doi       = {10.1609/AAAI.V37I7.25962},
  url       = {https://mlanthology.org/aaai/2023/huo2023aaai-learning/}
}