SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning

Abstract

Offline-to-online reinforcement learning (RL) provides a promising solution to improving suboptimal offline pre-trained policies through online fine-tuning. However, one efficient method, unconstrained fine-tuning, often suffers from severe policy collapse due to excessive distribution shift. To ensure stability, existing methods retain offline constraints and employ additional techniques during fine-tuning, which hurts efficiency. In this work, we introduce a novel perspective: eliminating the policy collapse without imposing constraints. We observe that such policy collapse arises from the mismatch between unconstrained fine-tuning and the conventional RL training framework. To this end, we propose Stabilized Unconstrained Fine-tuning (SUF), a streamlined framework that benefits from the efficiency of unconstrained fine-tuning while ensuring stability by modifying the Update-To-Data ratio. With just a few lines of code adjustments, SUF demonstrates remarkable adaptability to diverse backbones and superior performance over state-of-the-art baselines.

Cite

Text

Feng et al. "SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I11.29083

Markdown

[Feng et al. "SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/feng2024aaai-suf/) doi:10.1609/AAAI.V38I11.29083

BibTeX

@inproceedings{feng2024aaai-suf,
  title     = {{SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning}},
  author    = {Feng, Jiaheng and Feng, Mingxiao and Song, Haolin and Zhou, Wengang and Li, Houqiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {11961-11969},
  doi       = {10.1609/AAAI.V38I11.29083},
  url       = {https://mlanthology.org/aaai/2024/feng2024aaai-suf/}
}