LLMs and Personalities: Inconsistencies Across Scales

Abstract

This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and malleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." We find that larger models demonstrate more consistent responses across most personas, though this scaling behavior varies significantly by trait and persona type. The assistant persona showed the most predictable scaling patterns, while clinical personas exhibited more variable and sometimes extreme trait expressions. Including conversation history unexpectedly increased response variability. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.

Cite

Text

Tommaso et al. "LLMs and Personalities: Inconsistencies Across Scales." NeurIPS 2024 Workshops: Behavioral_ML, 2024.

Markdown

[Tommaso et al. "LLMs and Personalities: Inconsistencies Across Scales." NeurIPS 2024 Workshops: Behavioral_ML, 2024.](https://mlanthology.org/neuripsw/2024/tommaso2024neuripsw-llms/)

BibTeX

@inproceedings{tommaso2024neuripsw-llms,
  title     = {{LLMs and Personalities: Inconsistencies Across Scales}},
  author    = {Tommaso, Tosato and Hegazy, Mahmood and Lemay, David and Abukalam, Mohammed and Rish, Irina and Dumas, Guillaume},
  booktitle = {NeurIPS 2024 Workshops: Behavioral_ML},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/tommaso2024neuripsw-llms/}
}