On Interpretability and Overreliance

Abstract

One of the underlying drivers to create interpretable models is that they may help humans make better decisions. Given an interpretable model, a human decision-maker may be able to better understand the model's reasoning and incorporate its insights into their own decision-making process. Whether this effect occurs in practice is difficult to validate. It requires accounting for individuals' prior beliefs and objectively measuring when reliance on the model goes beyond what is reasonable given the available information. In this work, we address these challenges and validate if interpretability improves decision-making. Concretely, we compare how humans make decisions given a black-box model and an interpretable model, while controlling for their prior beliefs and rigorously quantifying rational behavior. Our results show that interpretable models can lead to overreliance and that the level of overreliance varies across models that we would consider to be equally interpretable. These findings raise fundamental concerns about current approaches to AI-assisted decision-making. They suggest that making models transparent is insufficient---and currently counterproductive---for promoting appropriate reliance.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Skirzynski et al. "On Interpretability and Overreliance." NeurIPS 2024 Workshops: InterpretableAI, 2024.

Markdown

[Skirzynski et al. "On Interpretability and Overreliance." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/skirzynski2024neuripsw-interpretability/)

BibTeX

@inproceedings{skirzynski2024neuripsw-interpretability,
  title     = {{On Interpretability and Overreliance}},
  author    = {Skirzynski, Julian and Glassman, Elena and Ustun, Berk},
  booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/skirzynski2024neuripsw-interpretability/}
}