A Merge Sort Based Ranking System for the Evaluation of Large Language Models

Abstract

Efficient and accurate evaluation of Large Language Models (LLMs) is essential for progress in the field of natural language processing. To address this, our paper introduces Transitive Merge Sort (TMS), a novel method that harnesses the advantages of merge sort’s efficiency, stability and parallelizability for model ranking in LLMs evaluation. This approach applies a divide-and-conquer strategy for pairwise comparisons, streamlining the evaluation process. Our experimental findings reveal that TMS not only improves the accuracy of model rankings when compared to methods like Elo rating and SuperCLUE (compared with GPT-3.5) but also significantly reduces the need for annotation resources by up to 70%. Additionally, we present an iterated version of TMS that effectively handles scenarios where initial model rankings are unknown.

Cite

Text

Li et al. "A Merge Sort Based Ranking System for the Evaluation of Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70378-2_15

Markdown

[Li et al. "A Merge Sort Based Ranking System for the Evaluation of Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/li2024ecmlpkdd-merge/) doi:10.1007/978-3-031-70378-2_15

BibTeX

@inproceedings{li2024ecmlpkdd-merge,
  title     = {{A Merge Sort Based Ranking System for the Evaluation of Large Language Models}},
  author    = {Li, Chenchen and Shi, Linfeng and Zhou, Chunyi and Huan, Zhaoxin and Tang, Chengfu and Zhang, Xiaolu and Wang, Xudong and Zhou, Jun and Liu, Song},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {240-255},
  doi       = {10.1007/978-3-031-70378-2_15},
  url       = {https://mlanthology.org/ecmlpkdd/2024/li2024ecmlpkdd-merge/}
}