Concept Gradient: Concept-Based Interpretation Without Linear Assumption

Abstract

Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based, gradient interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The premise of meaningful concepts lying in a linear subspace of model layers is usually implicitly assumed but does not hold true in general. In this work we proposed Concept Gradient (CG), which extends concept-based, gradient interpretation methods to non-linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically measure how a small change of concept affects the model’s prediction, which is an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in attributing concept importance on real world datasets and performed case study on a medical dataset. The code is available at github.com/jybai/concept-gradients.

Cite

Text

Bai et al. "Concept Gradient: Concept-Based Interpretation Without Linear Assumption." International Conference on Learning Representations, 2023.

Markdown

[Bai et al. "Concept Gradient: Concept-Based Interpretation Without Linear Assumption." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/bai2023iclr-concept/)

BibTeX

@inproceedings{bai2023iclr-concept,
  title     = {{Concept Gradient: Concept-Based Interpretation Without Linear Assumption}},
  author    = {Bai, Andrew and Yeh, Chih-Kuan and Lin, Neil Y.C. and Ravikumar, Pradeep Kumar and Hsieh, Cho-Jui},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/bai2023iclr-concept/}
}