Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Abstract

Large Language Models (LLMs) have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their evaluation. In this paper, we introduce a novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles. Diff-eRank assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training. We demonstrate the applicability of Diff-eRank in both single-modal (e.g., language) and multi-modal settings. For language models, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method. Our code is publicly available at https://github.com/waltonfuture/Diff-eRank.

Cite

Text

Wei et al. "Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-1248

Markdown

[Wei et al. "Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wei2024neurips-differank/) doi:10.52202/079017-1248

BibTeX

@inproceedings{wei2024neurips-differank,
  title     = {{Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models}},
  author    = {Wei, Lai and Tan, Zhiquan and Li, Chenghai and Wang, Jindong and Huang, Weiran},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1248},
  url       = {https://mlanthology.org/neurips/2024/wei2024neurips-differank/}
}