Evaluating Text Humanlikeness via Self-Similarity Exponent

Abstract

Evaluating text generation quality in large language models (LLMs) is critical for their deployment. We investigate the self-similarity exponent S, a fractal-based metric, as a metric for quantifying "humanlikeness." Using texts from the public available dataset and Qwen models (with/without instruction tuning), we find human-written texts exhibit S = 0.57, while non-instruct models show higher values, and instruct-tuned models approach human-like patterns. Larger models improve quality but benefit more with instruction tuning. Our findings suggest S as an effective metric for assessing LLM performance.

Cite

Text

Pershin. "Evaluating Text Humanlikeness via Self-Similarity Exponent." ICLR 2025 Workshops: BuildingTrust, 2025.

Markdown

[Pershin. "Evaluating Text Humanlikeness via Self-Similarity Exponent." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/pershin2025iclrw-evaluating/)

BibTeX

@inproceedings{pershin2025iclrw-evaluating,
  title     = {{Evaluating Text Humanlikeness via Self-Similarity Exponent}},
  author    = {Pershin, Ilya},
  booktitle = {ICLR 2025 Workshops: BuildingTrust},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/pershin2025iclrw-evaluating/}
}