Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention

Tan, Zhen; Chen, Tianlong; Zhang, Zhenyu; Liu, Huan

doi:10.1609/AAAI.V38I19.30160

Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention

Zhen Tan, Tianlong Chen, Zhenyu Zhang, Huan Liu

AAAI 2024 pp. 21619-21627

doi:10.1609/AAAI.V38I19.30160 /aaai/2024/tan2024aaai-sparsity/

Abstract

Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. However, the enigmatic ``black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. While past approaches, such as attention visualization, pivotal subnetwork extraction, and concept-based analyses, offer some insight, they often focus on either local or global explanations within a single dimension, occasionally falling short in providing comprehensive clarity. In response, we propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs. Our framework, termed SparseCBM, innovatively integrates sparsity to elucidate three intertwined layers of interpretation: input, subnetwork, and concept levels. In addition, the newly introduced dimension of interpretable inference-time intervention facilitates dynamic adjustments to the model during deployment. Through rigorous empirical evaluations on real-world datasets, we demonstrate that SparseCBM delivers a profound understanding of LLM behaviors, setting it apart in both interpreting and ameliorating model inaccuracies. Codes are provided in supplements.

PDF AAAI Semantic Scholar

Cite

Text

Tan et al. "Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30160

Markdown

[Tan et al. "Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/tan2024aaai-sparsity/) doi:10.1609/AAAI.V38I19.30160

BibTeX

@inproceedings{tan2024aaai-sparsity,
  title     = {{Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention}},
  author    = {Tan, Zhen and Chen, Tianlong and Zhang, Zhenyu and Liu, Huan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {21619-21627},
  doi       = {10.1609/AAAI.V38I19.30160},
  url       = {https://mlanthology.org/aaai/2024/tan2024aaai-sparsity/}
}