KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Abstract
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For ability modeling, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering 19 tasks. (2) For data, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For evaluation criteria, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models, and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate 21 open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset will be updated every three months to provide timely references for developing LLMs and knowledge-related systems.
Cite
Text
Yu et al. "KoLA: Carefully Benchmarking World Knowledge of Large Language Models." International Conference on Learning Representations, 2024.Markdown
[Yu et al. "KoLA: Carefully Benchmarking World Knowledge of Large Language Models." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/yu2024iclr-kola/)BibTeX
@inproceedings{yu2024iclr-kola,
title = {{KoLA: Carefully Benchmarking World Knowledge of Large Language Models}},
author = {Yu, Jifan and Wang, Xiaozhi and Tu, Shangqing and Cao, Shulin and Zhang-Li, Daniel and Lv, Xin and Peng, Hao and Yao, Zijun and Zhang, Xiaohan and Li, Hanming and Li, Chunyang and Zhang, Zheyuan and Bai, Yushi and Liu, Yantao and Xin, Amy and Yun, Kaifeng and Gong, Linlu and Lin, Nianyi and Chen, Jianhui and Wu, Zhili and Qi, Yunjia and Li, Weikai and Guan, Yong and Zeng, Kaisheng and Qi, Ji and Jin, Hailong and Liu, Jinxin and Gu, Yu and Yao, Yuan and Ding, Ning and Hou, Lei and Liu, Zhiyuan and Bin, Xu and Tang, Jie and Li, Juanzi},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/yu2024iclr-kola/}
}