Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) aims to learn a policy from a fixed dataset without additional environment interaction. However, effective offline policy learning often requires a large and diverse dataset to mitigate epistemic uncertainty. Collecting such data demands substantial online interactions, which are costly or infeasible in many real-world domains. Therefore, improving policy learning from limited offline data—achieving high data efficiency—is critical for practical offline RL. In this paper, we propose a simple yet effective plug-and-play pretraining framework that initializes the feature representation of a $Q$-network to enhance data efficiency in offline RL. Our approach employs a shared $Q$-network architecture trained in two stages: pretraining a backbone feature extractor with a transition prediction head; training a $Q$-network—combining the backbone feature extractor and a $Q$-value head—with *any* offline RL objective. Extensive experiments on the D4RL, Robomimic, V-D4RL, and ExoRL benchmarks show that our method substantially improves both performance and data efficiency across diverse datasets and domains. Remarkably, with only **10\%** of the dataset, our approach outperforms standard offline RL baselines trained on the full data.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Park et al. "Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Park et al. "Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/park2025neurips-pretraining/)

BibTeX

@inproceedings{park2025neurips-pretraining,
  title     = {{Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning}},
  author    = {Park, Jongchan and Park, Mingyu and Lee, Donghwan},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/park2025neurips-pretraining/}
}