VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

Che Wang, Xufang Luo, Keith Ross, Dongsheng Li

NeurIPS 2022

/neurips/2022/wang2022neurips-vrl3/

Abstract

We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wang et al. "VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning." Neural Information Processing Systems, 2022.

Markdown

[Wang et al. "VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/wang2022neurips-vrl3/)

BibTeX

@inproceedings{wang2022neurips-vrl3,
  title     = {{VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning}},
  author    = {Wang, Che and Luo, Xufang and Ross, Keith and Li, Dongsheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/wang2022neurips-vrl3/}
}