Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation

Abstract

Online unsupervised reinforcement learning (URL) can discover diverse skills via reward-free pre-training and exhibits impressive downstream task adaptation abilities through further fine-tuning. However, online URL methods face challenges in achieving zero-shot generalization, i.e., directly applying pre-trained policies to downstream tasks without additional planning or learning. In this paper, we propose a novel Dual-Value Forward-Backward representation (DVFB) framework with a contrastive entropy intrinsic reward to achieve both zero-shot generalization and fine-tuning adaptation in online URL. On the one hand, we demonstrate that poor exploration in forward-backward representations can lead to limited data diversity in online URL, impairing successor measures, and ultimately constraining generalization ability. To address this issue, the DVFB framework learns successor measures through a skill value function while promoting data diversity through an exploration value function, thus enabling zero-shot generalization. On the other hand, and somewhat surprisingly, by employing a straightforward dual-value fine-tuning scheme combined with a reward mapping technique, the pre-trained policy further enhances its performance through fine-tuning on downstream tasks, building on its zero-shot performance. Through extensive multi-task generalization experiments, DVFB demonstrates both superior zero-shot generalization (outperforming on all 12 tasks) and fine-tuning adaptation (leading on 10 out of 12 tasks) abilities, surpassing state-of-the-art URL methods.

Cite

Text

Sun et al. "Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation." International Conference on Learning Representations, 2025.

Markdown

[Sun et al. "Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/sun2025iclr-unsupervised/)

BibTeX

@inproceedings{sun2025iclr-unsupervised,
  title     = {{Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation}},
  author    = {Sun, Jingbo and Tu, Songjun and Zhang, Qichao and Li, Haoran and Liu, Xin and Chen, Yaran and Chen, Ke and Zhao, Dongbin},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/sun2025iclr-unsupervised/}
}