Conservative Offline Goal-Conditioned Implicit V-Learning
Abstract
Offline goal-conditioned reinforcement learning (GCRL) learns a goal-conditioned value function to train policies for diverse goals with pre-collected datasets. Hindsight experience replay addresses the issue of sparse rewards by treating intermediate states as goals but fails to complete goal-stitching tasks where achieving goals requires stitching different trajectories. While cross-trajectory sampling is a potential solution that associates states and goals belonging to different trajectories, we demonstrate that this direct method degrades performance in goal-conditioned tasks due to the overestimation of values on unconnected pairs. To this end, we propose Conservative Goal-Conditioned Implicit Value Learning (CGCIVL), a novel algorithm that introduces a penalty term to penalize value estimation for unconnected state-goal pairs and leverages the quasimetric framework to accurately estimate values for connected pairs. Evaluations on OGBench, a benchmark for offline GCRL, demonstrate that CGCIVL consistently surpasses state-of-the-art methods across diverse tasks.
Cite
Text
Ke et al. "Conservative Offline Goal-Conditioned Implicit V-Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Ke et al. "Conservative Offline Goal-Conditioned Implicit V-Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ke2025icml-conservative/)BibTeX
@inproceedings{ke2025icml-conservative,
title = {{Conservative Offline Goal-Conditioned Implicit V-Learning}},
author = {Ke, Kaiqiang and Lin, Qian and Liu, Zongkai and He, Shenghong and Yu, Chao},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {29591-29607},
volume = {267},
url = {https://mlanthology.org/icml/2025/ke2025icml-conservative/}
}