Conservative Offline Goal-Conditioned Implicit V-Learning

Abstract

Offline goal-conditioned reinforcement learning (GCRL) learns a goal-conditioned value function to train policies for diverse goals with pre-collected datasets. Hindsight experience replay addresses the issue of sparse rewards by treating intermediate states as goals but fails to complete goal-stitching tasks where achieving goals requires stitching different trajectories. While cross-trajectory sampling is a potential solution that associates states and goals belonging to different trajectories, we demonstrate that this direct method degrades performance in goal-conditioned tasks due to the overestimation of values on unconnected pairs. To this end, we propose Conservative Goal-Conditioned Implicit Value Learning (CGCIVL), a novel algorithm that introduces a penalty term to penalize value estimation for unconnected state-goal pairs and leverages the quasimetric framework to accurately estimate values for connected pairs. Evaluations on OGBench, a benchmark for offline GCRL, demonstrate that CGCIVL consistently surpasses state-of-the-art methods across diverse tasks.

Cite

Text

Ke et al. "Conservative Offline Goal-Conditioned Implicit V-Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Ke et al. "Conservative Offline Goal-Conditioned Implicit V-Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ke2025icml-conservative/)

BibTeX

@inproceedings{ke2025icml-conservative,
  title     = {{Conservative Offline Goal-Conditioned Implicit V-Learning}},
  author    = {Ke, Kaiqiang and Lin, Qian and Liu, Zongkai and He, Shenghong and Yu, Chao},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {29591-29607},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/ke2025icml-conservative/}
}