A Merge Sort Based Ranking System for the Evaluation of Large Language Models

Li, Chenchen; Shi, Linfeng; Zhou, Chunyi; Huan, Zhaoxin; Tang, Chengfu; Zhang, Xiaolu; Wang, Xudong; Zhou, Jun; Liu, Song

doi:10.1007/978-3-031-70378-2_15

A Merge Sort Based Ranking System for the Evaluation of Large Language Models

Chenchen Li, Linfeng Shi, Chunyi Zhou, Zhaoxin Huan, Chengfu Tang, Xiaolu Zhang, Xudong Wang, Jun Zhou, Song Liu

ECML-PKDD 2024 pp. 240-255

doi:10.1007/978-3-031-70378-2_15 /ecmlpkdd/2024/li2024ecmlpkdd-merge/

Abstract

Efficient and accurate evaluation of Large Language Models (LLMs) is essential for progress in the field of natural language processing. To address this, our paper introduces Transitive Merge Sort (TMS), a novel method that harnesses the advantages of merge sort’s efficiency, stability and parallelizability for model ranking in LLMs evaluation. This approach applies a divide-and-conquer strategy for pairwise comparisons, streamlining the evaluation process. Our experimental findings reveal that TMS not only improves the accuracy of model rankings when compared to methods like Elo rating and SuperCLUE (compared with GPT-3.5) but also significantly reduces the need for annotation resources by up to 70%. Additionally, we present an iterated version of TMS that effectively handles scenarios where initial model rankings are unknown.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Li et al. "A Merge Sort Based Ranking System for the Evaluation of Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70378-2_15

Markdown

[Li et al. "A Merge Sort Based Ranking System for the Evaluation of Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/li2024ecmlpkdd-merge/) doi:10.1007/978-3-031-70378-2_15

BibTeX

@inproceedings{li2024ecmlpkdd-merge,
  title     = {{A Merge Sort Based Ranking System for the Evaluation of Large Language Models}},
  author    = {Li, Chenchen and Shi, Linfeng and Zhou, Chunyi and Huan, Zhaoxin and Tang, Chengfu and Zhang, Xiaolu and Wang, Xudong and Zhou, Jun and Liu, Song},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {240-255},
  doi       = {10.1007/978-3-031-70378-2_15},
  url       = {https://mlanthology.org/ecmlpkdd/2024/li2024ecmlpkdd-merge/}
}