Evaluating Text Humanlikeness via Self-Similarity Exponent
Abstract
Evaluating text generation quality in large language models (LLMs) is critical for their deployment. We investigate the self-similarity exponent S, a fractal-based metric, as a metric for quantifying "humanlikeness." Using texts from the public available dataset and Qwen models (with/without instruction tuning), we find human-written texts exhibit S = 0.57, while non-instruct models show higher values, and instruct-tuned models approach human-like patterns. Larger models improve quality but benefit more with instruction tuning. Our findings suggest S as an effective metric for assessing LLM performance.
Cite
Text
Pershin. "Evaluating Text Humanlikeness via Self-Similarity Exponent." ICLR 2025 Workshops: BuildingTrust, 2025.Markdown
[Pershin. "Evaluating Text Humanlikeness via Self-Similarity Exponent." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/pershin2025iclrw-evaluating/)BibTeX
@inproceedings{pershin2025iclrw-evaluating,
title = {{Evaluating Text Humanlikeness via Self-Similarity Exponent}},
author = {Pershin, Ilya},
booktitle = {ICLR 2025 Workshops: BuildingTrust},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/pershin2025iclrw-evaluating/}
}