PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting Their Performances in Benchmarks

Abstract

This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metric based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known relationships across a set of 111 open-source and 45 closed models. Furthermore, our phylogenetic distance predicts performance in standard benchmarks, thus demonstrating its functional validity and paving the way for a time and cost-effective estimation of LLM capabilities. To sum up, by translating population genetic concepts to machine learning, we propose and validate a tool to evaluate LLM development, relationships and capabilities, even in the absence of transparent training information.

Cite

Text

Yax et al. "PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting Their Performances in Benchmarks." International Conference on Learning Representations, 2025.

Markdown

[Yax et al. "PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting Their Performances in Benchmarks." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/yax2025iclr-phylolm/)

BibTeX

@inproceedings{yax2025iclr-phylolm,
  title     = {{PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting Their Performances in Benchmarks}},
  author    = {Yax, Nicolas and Oudeyer, Pierre-Yves and Palminteri, Stefano},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/yax2025iclr-phylolm/}
}