WizardArena: Post-Training Large Language Models via Simulated Offline Chatbot Arena

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

NeurIPS 2024

doi:10.52202/079017-3543 /neurips/2024/luo2024neurips-wizardarena/

Abstract

Recent work demonstrates that, post-training large language models with open-domain instruction following data have achieved colossal success. Simultaneously, human Chatbot Arena has emerged as one of the most reasonable benchmarks for model evaluation and developmental guidance. However, the processes of manually curating high-quality training data and utilizing online human evaluation platforms are both expensive and limited. To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. For evaluation scenario, WizardArena can efficiently predict accurate performance rankings among different models based on offline test set. For training scenario, we simulate arena battles among various state-of-the-art models on a large scale of instruction data, subsequently leveraging the battle results to constantly enhance target model in both the supervised fine-tuning and reinforcement learning . Experimental results demonstrate that our WizardArena aligns closely with the online human arena rankings, and our models trained on offline extensive battle data exhibit significant performance improvements during SFT, DPO, and PPO stages.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Luo et al. "WizardArena: Post-Training Large Language Models via Simulated Offline Chatbot Arena." Neural Information Processing Systems, 2024. doi:10.52202/079017-3543

Markdown

[Luo et al. "WizardArena: Post-Training Large Language Models via Simulated Offline Chatbot Arena." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/luo2024neurips-wizardarena/) doi:10.52202/079017-3543

BibTeX

@inproceedings{luo2024neurips-wizardarena,
  title     = {{WizardArena: Post-Training Large Language Models via Simulated Offline Chatbot Arena}},
  author    = {Luo, Haipeng and Sun, Qingfeng and Xu, Can and Zhao, Pu and Lin, Qingwei and Lou, Jianguang and Chen, Shifeng and Tang, Yansong and Chen, Weizhu},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3543},
  url       = {https://mlanthology.org/neurips/2024/luo2024neurips-wizardarena/}
}