MOTO: Offline to Online Fine-Tuning for Model-Based Reinforcement Learning
Abstract
We study the problem of offline-to-online reinforcement learning from high-dimensional pixel observations. While recent model-free approaches successfully use offline pre-training with online fine-tuning to either improve the performance of the data-collection policy or adapt to novel tasks, model-based approaches still remain underutilized in this setting. In this work, we argue that existing methods for high-dimensional model-based offline RL are not suitable for offline-to-online fine-tuning due to issues with representation learning shifts, off-dynamics data, and non-stationary rewards. We propose a simple on-policy model-based method with adaptive behavior regularization. In our simulation experiments, we find that our approach successfully solves long-horizon robot manipulation tasks completely from images by using a combination of offline data and online interactions.
Cite
Text
Rafailov et al. "MOTO: Offline to Online Fine-Tuning for Model-Based Reinforcement Learning." ICLR 2023 Workshops: RRL, 2023.Markdown
[Rafailov et al. "MOTO: Offline to Online Fine-Tuning for Model-Based Reinforcement Learning." ICLR 2023 Workshops: RRL, 2023.](https://mlanthology.org/iclrw/2023/rafailov2023iclrw-moto/)BibTeX
@inproceedings{rafailov2023iclrw-moto,
title = {{MOTO: Offline to Online Fine-Tuning for Model-Based Reinforcement Learning}},
author = {Rafailov, Rafael and Hatch, Kyle Beltran and Kolev, Victor and Martin, John D and Phielipp, Mariano and Finn, Chelsea},
booktitle = {ICLR 2023 Workshops: RRL},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/rafailov2023iclrw-moto/}
}