SPEX: Scaling Feature Interaction Explanations for LLMs

Kang, Justin Singh; Butler, Landon; Agarwal, Abhineet; Erginbas, Yigit Efe; Pedarsani, Ramtin; Yu, Bin; Ramchandran, Kannan

SPEX: Scaling Feature Interaction Explanations for LLMs

Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe Erginbas, Ramtin Pedarsani, Bin Yu, Kannan Ramchandran

ICLRW 2025

/iclrw/2025/kang2025iclrw-spex/

Abstract

Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide *marginal* feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose *Spectral Explainer* (SPEX, a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths ($\approx 1000)$. SPEX exploits underlying natural sparsity among interactions—common in real-world data—and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform experiments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal attribution methods by up to 20\% in terms of faithfully reconstructing LLM outputs. Further, SPEX successfully identifies key features and interactions that strongly influence model output. For one of our datasets, *HotpotQA*, SPEX provides interactions that align with human annotations. Finally, we use our model-agnostic approach to generate explanations to demonstrate abstract reasoning in closed-source LLMs (*GPT-4o mini*) and compositional reasoning in vision-language models.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Kang et al. "SPEX: Scaling Feature Interaction Explanations for LLMs." ICLR 2025 Workshops: BuildingTrust, 2025.

Markdown

[Kang et al. "SPEX: Scaling Feature Interaction Explanations for LLMs." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/kang2025iclrw-spex/)

BibTeX

@inproceedings{kang2025iclrw-spex,
  title     = {{SPEX: Scaling Feature Interaction Explanations for LLMs}},
  author    = {Kang, Justin Singh and Butler, Landon and Agarwal, Abhineet and Erginbas, Yigit Efe and Pedarsani, Ramtin and Yu, Bin and Ramchandran, Kannan},
  booktitle = {ICLR 2025 Workshops: BuildingTrust},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/kang2025iclrw-spex/}
}