Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data
Abstract
A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.
Cite
Text
Deng et al. "Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I15.33784Markdown
[Deng et al. "Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/deng2025aaai-enhancing-a/) doi:10.1609/AAAI.V39I15.33784BibTeX
@inproceedings{deng2025aaai-enhancing-a,
title = {{Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data}},
author = {Deng, Shilong and Zheng, Zetao and He, Hongcai and Weng, Paul and Shao, Jie},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {16244-16252},
doi = {10.1609/AAAI.V39I15.33784},
url = {https://mlanthology.org/aaai/2025/deng2025aaai-enhancing-a/}
}