ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-Sample $v$-Ensemble

Chen, Tianyuan; Cai, Ronglong; Wu, Faguo; Zhang, Xiao

ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-Sample $v$-Ensemble

Tianyuan Chen, Ronglong Cai, Faguo Wu, Xiao Zhang

ICLR 2025

/iclr/2025/chen2025iclr-active/

Abstract

Offline reinforcement learning (RL) aims to learn from static datasets and thus faces the challenge of value estimation errors for out-of-distribution actions. The in-sample learning scheme addresses this issue by performing implicit TD backups that does not query the values of unseen actions. However, pre-existing in-sample value learning and policy extraction methods suffer from over-regularization, limiting their performance on suboptimal or compositional datasets. In this paper, we analyze key factors in in-sample learning that might potentially hinder the use of a milder constraint. We propose Actor-Critic with Temperature adjustment and In-sample Value Ensemble (ACTIVE), a novel in-sample offline RL algorithm that leverages an ensemble of $V$-functions for critic training and adaptively adjusts the constraint level using dual gradient descent. We theoretically show that the $V$-ensemble suppresses the accumulation of initial value errors, thereby mitigating overestimation. Our experiments on the D4RL benchmarks demonstrate that ACTIVE alleviates overfitting of value functions and outperforms existing in-sample methods in terms of learning stability and policy optimality.

PDF ICLR Semantic Scholar

Cite

Text

Chen et al. "ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-Sample $v$-Ensemble." International Conference on Learning Representations, 2025.

Markdown

[Chen et al. "ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-Sample $v$-Ensemble." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/chen2025iclr-active/)

BibTeX

@inproceedings{chen2025iclr-active,
  title     = {{ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-Sample $v$-Ensemble}},
  author    = {Chen, Tianyuan and Cai, Ronglong and Wu, Faguo and Zhang, Xiao},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/chen2025iclr-active/}
}