The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning
Abstract
Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods.
Cite
Text
Hu et al. "The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning." International Conference on Learning Representations, 2023.Markdown
[Hu et al. "The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/hu2023iclr-provable-a/)BibTeX
@inproceedings{hu2023iclr-provable-a,
title = {{The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning}},
author = {Hu, Hao and Yang, Yiqin and Zhao, Qianchuan and Zhang, Chongjie},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/hu2023iclr-provable-a/}
}