AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback

Abstract

Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their ability to follow user instructions well.Developing these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following process faces three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these bottlenecks with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM based simulator for human feedback that is 45x cheaper than crowdworkers and displays high agreement with humans. Second, we identify an evaluation dataset representative of real-world instructions and propose an automatic evaluation procedure. Third, we contribute reference implementations for several methods (PPO, best-of-n, expert iteration, among others) that learn from pairwise feedback. Finally, as an end-to-end validation of AlpacaFarm, we train and evaluate eleven models on 10k pairs of human feedback and show that rankings of models trained in AlpacaFarm match rankings of models trained on human data. As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% win-rate improvement against Davinci003.

Cite

Text

Dubois et al. "AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback." Neural Information Processing Systems, 2023.

Markdown

[Dubois et al. "AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/dubois2023neurips-alpacafarm/)

BibTeX

@inproceedings{dubois2023neurips-alpacafarm,
  title     = {{AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback}},
  author    = {Dubois, Yann and Li, Chen Xuechen and Taori, Rohan and Zhang, Tianyi and Gulrajani, Ishaan and Ba, Jimmy and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/dubois2023neurips-alpacafarm/}
}