Machine Learning Explainability from an Information-Theoretic Perspective

Abstract

The primary challenge for practitioners with multiple \textit{post-hoc gradient-based} interpretability methods is to benchmark them and select the best. Using information theory, we represent finding the optimal explainer as a rate-distortion optimization problem. Therefore : \begin{itemize} \it{em} We propose an information-theoretic test \verb|InfoExplain| to resolve the benchmarking ambiguity in a model agnostic manner without additional user data (apart from the input features, model, and explanations). \it{em} We show that \verb|InfoExplain| is extendable to utilise human interpretable concepts, deliver performance guarantees, and filter out erroneous explanations. \end{itemize} The adjoining experiments, code can be found at \url{github.com/DebarghaG/info-explain}

Cite

Text

Ganguly and Gupta. "Machine Learning Explainability from an Information-Theoretic Perspective." NeurIPS 2022 Workshops: InfoCog, 2022.

Markdown

[Ganguly and Gupta. "Machine Learning Explainability from an Information-Theoretic Perspective." NeurIPS 2022 Workshops: InfoCog, 2022.](https://mlanthology.org/neuripsw/2022/ganguly2022neuripsw-machine/)

BibTeX

@inproceedings{ganguly2022neuripsw-machine,
  title     = {{Machine Learning Explainability from an Information-Theoretic Perspective}},
  author    = {Ganguly, Debargha and Gupta, Debayan},
  booktitle = {NeurIPS 2022 Workshops: InfoCog},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/ganguly2022neuripsw-machine/}
}