Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Abstract

A compelling use case of offline reinforcement learning (RL) is to obtain an effective policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction in the environment. Many existing offline RL methods tend to exhibit poor fine-tuning performance. On the contrary, while naive online RL methods achieve compelling empirical performance, online methods suffer from a large sample complexity without a good policy initialization from the offline data. Our goal in this paper is to devise an approach for learning an effective offline initialization that also unlocks fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL) accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, meaning that the learned value estimation still upper-bounds the ground-truth value of some other reference policy (e.g., the behavior policy). Both theoretically and empirically, we show that imposing these conditions speeds up online fine-tuning, and brings in benefits of the offline data. In practice, Cal-QL can be implemented on top of existing offline RL methods without any extra hyperparameter tuning. Empirically, Cal-QL outperforms state-of-the-art methods on a wide range of fine-tuning tasks from both state and visual observations, across several benchmarks.

Cite

Text

Nakamoto et al. "Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning." ICLR 2023 Workshops: RRL, 2023.

Markdown

[Nakamoto et al. "Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning." ICLR 2023 Workshops: RRL, 2023.](https://mlanthology.org/iclrw/2023/nakamoto2023iclrw-calql/)

BibTeX

@inproceedings{nakamoto2023iclrw-calql,
  title     = {{Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning}},
  author    = {Nakamoto, Mitsuhiko and Zhai, Yuexiang and Singh, Anikait and Ma, Yi and Finn, Chelsea and Kumar, Aviral and Levine, Sergey},
  booktitle = {ICLR 2023 Workshops: RRL},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/nakamoto2023iclrw-calql/}
}