Active Learning for Iterative Offline Reinforcement Learning

Abstract

Offline Reinforcement Learning (RL) has emerged as a promising approach to address real-world challenges where online interactions with the environment are limited, risky, or costly. Although, recent advancements produce high quality policies from offline data, currently, there is no systematic methodology to continue to improve them without resorting to online fine-tuning. This paper proposes to repurpose Offline RL to produce a sequence of improving policies, namely, Iterative Offline Reinforcement Learning (IORL). To produce such sequence, IORL has to cope with imbalanced offline datasets and to perform controlled environment exploration. Specifically, we introduce ”Return-based Sampling” as means to selectively prioritize experience from high-return trajectories and active learning driven ”Dataset Uncertainty Sampling” to probe state-actions inversely proportional to density in the dataset.We demonstrate that our proposed approach produces policies that achieve monotonically increasing average returns, from 65.4 to 140.2, in the Atari environment.

Cite

Text

Zhang et al. "Active Learning for Iterative Offline Reinforcement Learning." NeurIPS 2023 Workshops: ReALML, 2023.

Markdown

[Zhang et al. "Active Learning for Iterative Offline Reinforcement Learning." NeurIPS 2023 Workshops: ReALML, 2023.](https://mlanthology.org/neuripsw/2023/zhang2023neuripsw-active/)

BibTeX

@inproceedings{zhang2023neuripsw-active,
  title     = {{Active Learning for Iterative Offline Reinforcement Learning}},
  author    = {Zhang, Lan and Tedesco, Luigi Franco and Rajak, Pankaj and Zemmouri, Youcef and Brunzell, Hakan},
  booktitle = {NeurIPS 2023 Workshops: ReALML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/zhang2023neuripsw-active/}
}