Reinforcement Learning from Human Text Feedback: Learning a Reward Model from Human Text Input

Abstract

We explore the use of human-generated text inputs to model rewards in Reinforcement Learning with Human Feedback (RLHF). Human text contains rich and nuanced information, yet most previous work relies on preference feedback or restricts the text structure. We propose using Large Language Models (LLMs) as a way of harnessing the information from natural text to train a reward model efficiently. Our empirical evaluations demonstrate the advantages of this approach in both tabular and continuous reinforcement learning tasks. The results show that even with minimal human interactions, integrating text feedback with LLMs enables our method to approximate the reward function accurately, leading to significant performance improvements.

Cite

Text

Urcelay et al. "Reinforcement Learning from Human Text Feedback: Learning a Reward Model from Human Text Input." ICML 2024 Workshops: MFHAIA, 2024.

Markdown

[Urcelay et al. "Reinforcement Learning from Human Text Feedback: Learning a Reward Model from Human Text Input." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/urcelay2024icmlw-reinforcement/)

BibTeX

@inproceedings{urcelay2024icmlw-reinforcement,
  title     = {{Reinforcement Learning from Human Text Feedback: Learning a Reward Model from Human Text Input}},
  author    = {Urcelay, Belen Martin and Krause, Andreas and Ramponi, Giorgia},
  booktitle = {ICML 2024 Workshops: MFHAIA},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/urcelay2024icmlw-reinforcement/}
}