Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Abstract

Uncertainty estimation in Large Language Models (LLMs) is challenging because token-level uncertainty includes uncertainty over lexical and syntactical variations, and thus fails to accurately capture uncertainty over the semantic meaning of the generation. To address this, Farquhar et al. have recently introduced semantic uncertainty (SE), which quantifies uncertainty in the semantic meaning by aggregating token-level probabilities of generations if they are semantically equivalent. Kossen et al. further demonstrated that SE can be cheaply and reliably captured using linear probes on the model hidden states. In this work, we build on these results and show that semantic uncertainty in LLMs can be predicted from only a very small set of neurons. We find these neurons by training linear probes with $L_1$ regularization. Our approach matches the performance of full-neuron probes in predicting SE. An intervention study further shows these neurons causally affect the semantic uncertainty of model generations. Our findings reveal how hidden-state neurons encode semantic uncertainty, present a method to manipulate this uncertainty, and contribute insights for the field of interpretability research.

Cite

Text

Han et al. "Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs." NeurIPS 2024 Workshops: MINT, 2024.

Markdown

[Han et al. "Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs." NeurIPS 2024 Workshops: MINT, 2024.](https://mlanthology.org/neuripsw/2024/han2024neuripsw-semantic/)

BibTeX

@inproceedings{han2024neuripsw-semantic,
  title     = {{Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs}},
  author    = {Han, Jiatong and Kossen, Jannik and Razzak, Muhammed and Gal, Yarin},
  booktitle = {NeurIPS 2024 Workshops: MINT},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/han2024neuripsw-semantic/}
}