FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

ICML 2024 pp. 14256-14274

Abstract

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/FuRL.

Cite

Text

Fu et al. "FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning." International Conference on Machine Learning, 2024.

Markdown

[Fu et al. "FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/fu2024icml-furl/)

BibTeX

@inproceedings{fu2024icml-furl,
  title     = {{FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning}},
  author    = {Fu, Yuwei and Zhang, Haichao and Wu, Di and Xu, Wei and Boulet, Benoit},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {14256-14274},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/fu2024icml-furl/}
}