On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Abstract

Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, generally leading to overestimation of methods' real-world utility. In this work, we seek to address this by conducting a study that evaluates post-hoc explainable ML methods in a setting consistent with the application context and provide a template for future evaluation studies. We modify and improve a prior study on e-commerce fraud detection by relaxing the original work's simplifying assumptions that departed from the deployment context. Our study finds no evidence for the utility of the tested explainable ML methods in the context, which is a drastically different conclusion from the earlier work. This highlights how seemingly trivial experimental design choices can yield misleading conclusions about method utility. In addition, our work carries lessons about the necessity of not only evaluating explainable ML methods using tasks, data, users, and metrics grounded in the intended application context but also developing methods tailored to specific applications, moving beyond general-purpose explainable ML methods.

Cite

Text

Amarasinghe et al. "On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I19.30082

Markdown

[Amarasinghe et al. "On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/amarasinghe2024aaai-importance/) doi:10.1609/AAAI.V38I19.30082

BibTeX

@inproceedings{amarasinghe2024aaai-importance,
  title     = {{On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods}},
  author    = {Amarasinghe, Kasun and Rodolfa, Kit T. and Jesus, Sérgio M. and Chen, Valerie and Balayan, Vladimir and Saleiro, Pedro and Bizarro, Pedro and Talwalkar, Ameet and Ghani, Rayid},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {20921-20929},
  doi       = {10.1609/AAAI.V38I19.30082},
  url       = {https://mlanthology.org/aaai/2024/amarasinghe2024aaai-importance/}
}