Does a Neural Network Really Encode Symbolic Concepts?

Abstract

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition. The code is released at https://github.com/sjtu-xai-lab/interaction-concept.

Cite

Text

Li and Zhang. "Does a Neural Network Really Encode Symbolic Concepts?." International Conference on Machine Learning, 2023.

Markdown

[Li and Zhang. "Does a Neural Network Really Encode Symbolic Concepts?." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/li2023icml-neural/)

BibTeX

@inproceedings{li2023icml-neural,
  title     = {{Does a Neural Network Really Encode Symbolic Concepts?}},
  author    = {Li, Mingjie and Zhang, Quanshi},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {20452-20469},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/li2023icml-neural/}
}