Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Abstract

A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.

Cite

Text

Deng et al. "Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I15.33784

Markdown

[Deng et al. "Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/deng2025aaai-enhancing-a/) doi:10.1609/AAAI.V39I15.33784

BibTeX

@inproceedings{deng2025aaai-enhancing-a,
  title     = {{Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data}},
  author    = {Deng, Shilong and Zheng, Zetao and He, Hongcai and Weng, Paul and Shao, Jie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {16244-16252},
  doi       = {10.1609/AAAI.V39I15.33784},
  url       = {https://mlanthology.org/aaai/2025/deng2025aaai-enhancing-a/}
}