Towards Zero-Shot Generalization in Offline Reinforcement Learning
Abstract
In this work, we study offline reinforcement learning (RL) with zero-shot generalization property (ZSG), where the agent has access to an offline dataset including experiences from different environments, and the goal of the agent is to train a policy over the training environments which performs well on test environments without further interaction. Existing work showed that classical offline RL fails to generalize to new, unseen environments. To address such an issue, we propose new offline RL frameworks with ZSG, based on empirical risk minimization or proximal policy optimization. We prove that our frameworks find the near-optimal policy with ZSG both theoretically and empirically, from general environments to specific settings such as linear Markov decision processes (MDPs). Our result serves as a first step in understanding the foundation of the generalization phenomenon in offline reinforcement learning.
Cite
Text
Wang et al. "Towards Zero-Shot Generalization in Offline Reinforcement Learning." ICML 2024 Workshops: ARLET, 2024.Markdown
[Wang et al. "Towards Zero-Shot Generalization in Offline Reinforcement Learning." ICML 2024 Workshops: ARLET, 2024.](https://mlanthology.org/icmlw/2024/wang2024icmlw-zeroshot/)BibTeX
@inproceedings{wang2024icmlw-zeroshot,
title = {{Towards Zero-Shot Generalization in Offline Reinforcement Learning}},
author = {Wang, Zhiyong and Yang, Chen and Lui, John C.S. and Zhou, Dongruo},
booktitle = {ICML 2024 Workshops: ARLET},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/wang2024icmlw-zeroshot/}
}