ConSCompF: Consistency-Focused Similarity Comparison Framework for Generative Large Language Models
Abstract
Large Language Models (LLM) are one of the most important discoveries in machine learning in recent years. LLM-based artificial intelligence (AI) assistants, such as ChatGPT, have consistently attracted attention from researchers, investors, and the general public, driving the rapid growth of this industry. With dozens of new LLMs released every month, it becomes quite challenging to differentiate between them, thereby creating a demand for new LLM comparison methods. In this research, the Consistency-focused Similarity Comparison Framework (ConSCompF) for generative large language models is proposed. It compares texts generated by two LLMs and produces a similarity score, indicating the overall degree of similarity between their responses. The main advantage of this framework is that it can operate on a small number of unlabeled data, such as chatbot instruction prompts, and does not require LLM developers to disclose any information about their product. To evaluate the efficacy of ConSCompF, two experiments aimed at identifying similarities between multiple LLMs are conducted. Additionally, these experiments examine the correlation between the similarity scores generated by ConSCompF and the differences in outputs produced by other benchmarking techniques, such as ROUGE-L. Finally, a series of few-shot LLM comparison experiments is conducted to evaluate the performance of ConSCompF in a few-shot LLM comparison scenario. The proposed framework can be used for calculating similarity matrices of multiple LLMs, which can be effectively visualized using principal component analysis (PCA). The outputs of ConSCompF may provide useful insights into data that might have been used during LLM training and help detect potential investment fraud attempts.
Cite
Text
Karev and Xu. "ConSCompF: Consistency-Focused Similarity Comparison Framework for Generative Large Language Models." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.17028Markdown
[Karev and Xu. "ConSCompF: Consistency-Focused Similarity Comparison Framework for Generative Large Language Models." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/karev2025jair-conscompf/) doi:10.1613/JAIR.1.17028BibTeX
@article{karev2025jair-conscompf,
title = {{ConSCompF: Consistency-Focused Similarity Comparison Framework for Generative Large Language Models}},
author = {Karev, Alexey and Xu, Dong},
journal = {Journal of Artificial Intelligence Research},
year = {2025},
pages = {1325-1347},
doi = {10.1613/JAIR.1.17028},
volume = {82},
url = {https://mlanthology.org/jair/2025/karev2025jair-conscompf/}
}