A Merge Sort Based Ranking System for the Evaluation of Large Language Models
Abstract
Efficient and accurate evaluation of Large Language Models (LLMs) is essential for progress in the field of natural language processing. To address this, our paper introduces Transitive Merge Sort (TMS), a novel method that harnesses the advantages of merge sort’s efficiency, stability and parallelizability for model ranking in LLMs evaluation. This approach applies a divide-and-conquer strategy for pairwise comparisons, streamlining the evaluation process. Our experimental findings reveal that TMS not only improves the accuracy of model rankings when compared to methods like Elo rating and SuperCLUE (compared with GPT-3.5) but also significantly reduces the need for annotation resources by up to 70%. Additionally, we present an iterated version of TMS that effectively handles scenarios where initial model rankings are unknown.
Cite
Text
Li et al. "A Merge Sort Based Ranking System for the Evaluation of Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70378-2_15Markdown
[Li et al. "A Merge Sort Based Ranking System for the Evaluation of Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/li2024ecmlpkdd-merge/) doi:10.1007/978-3-031-70378-2_15BibTeX
@inproceedings{li2024ecmlpkdd-merge,
title = {{A Merge Sort Based Ranking System for the Evaluation of Large Language Models}},
author = {Li, Chenchen and Shi, Linfeng and Zhou, Chunyi and Huan, Zhaoxin and Tang, Chengfu and Zhang, Xiaolu and Wang, Xudong and Zhou, Jun and Liu, Song},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2024},
pages = {240-255},
doi = {10.1007/978-3-031-70378-2_15},
url = {https://mlanthology.org/ecmlpkdd/2024/li2024ecmlpkdd-merge/}
}