CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
Abstract
In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples. We show that CLIP-Dissect provides more accurate descriptions than existing methods for last layer neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and can label all neurons from five layers of ResNet-50 in just 4 minutes, which is more than 10$\times$ faster than existing methods. Our code is available at https://github.com/Trustworthy-ML-Lab/CLIP-dissect.
Cite
Text
Oikarinen and Weng. "CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks." International Conference on Learning Representations, 2023.Markdown
[Oikarinen and Weng. "CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/oikarinen2023iclr-clipdissect/)BibTeX
@inproceedings{oikarinen2023iclr-clipdissect,
title = {{CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks}},
author = {Oikarinen, Tuomas and Weng, Tsui-Wei},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/oikarinen2023iclr-clipdissect/}
}