CoSy: Evaluating Textual Explanations of Neurons

Abstract

A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While methods exist to connect neurons to human-understandable textual descriptions, evaluating the quality of these explanations is challenging due to the lack of a unified quantitative approach. We introduce CoSy (Concept Synthesis), a novel, architecture-agnostic framework for evaluating textual explanations of latent neurons. Given textual explanations, our proposed framework uses a generative model conditioned on textual input to create data points representing the explanations. By comparing the neuron's response to these generated data points and control data points, we can estimate the quality of the explanation. We validate our framework through sanity checks and benchmark various neuron description methods for Computer Vision tasks, revealing significant differences in quality.

Cite

Text

Kopf et al. "CoSy: Evaluating Textual Explanations of Neurons." Neural Information Processing Systems, 2024. doi:10.52202/079017-1093

Markdown

[Kopf et al. "CoSy: Evaluating Textual Explanations of Neurons." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/kopf2024neurips-cosy/) doi:10.52202/079017-1093

BibTeX

@inproceedings{kopf2024neurips-cosy,
  title     = {{CoSy: Evaluating Textual Explanations of Neurons}},
  author    = {Kopf, Laura and Bommer, Philine Lou and Hedström, Anna and Lapuschkin, Sebastian and Höhne, Marina M.-C. and Bykov, Kirill},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1093},
  url       = {https://mlanthology.org/neurips/2024/kopf2024neurips-cosy/}
}