Non-Adaptive Online Finetuning for Offline Reinforcement Learning
Abstract
Offline reinforcement learning (RL) has emerged as an important framework for applying RL to real-life applications. However, the complete lack of online interactions causes technical difficulties, and the _online finetuning_ setting incorporates a limited form of online interactions---which is often available in practice---to address these challenges. Unfortunately, current theoretical frameworks for online finetuning either assume high online sample complexity and/or require deploying fully adaptive algorithms (i.e., unlimited policy changes), which restricts their application to real-world settings where online interactions and policy updates are expensive and limited. In this paper, we develop a new framework for online finetuning. Instead of competing with the optimal policy (which inherits the high sample complexity and adaptivity requirements of online RL), we aim to learn a new policy that improves as much as possible over the existing policy using a _pre-specified_ number of online samples and with a _non-adaptive_ data-collection policy. Our formulation reveals surprising nuances and suggests novel principles that distinguishes the finetuning problem from purely online and offline RL.
Cite
Text
Huang et al. "Non-Adaptive Online Finetuning for Offline Reinforcement Learning." NeurIPS 2023 Workshops: GenPlan, 2023.Markdown
[Huang et al. "Non-Adaptive Online Finetuning for Offline Reinforcement Learning." NeurIPS 2023 Workshops: GenPlan, 2023.](https://mlanthology.org/neuripsw/2023/huang2023neuripsw-nonadaptive/)BibTeX
@inproceedings{huang2023neuripsw-nonadaptive,
title = {{Non-Adaptive Online Finetuning for Offline Reinforcement Learning}},
author = {Huang, Audrey and Ghavamzadeh, Mohammad and Jiang, Nan and Petrik, Marek},
booktitle = {NeurIPS 2023 Workshops: GenPlan},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/huang2023neuripsw-nonadaptive/}
}