Human-in-the-Loop Interpretability Prior

Abstract

We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user studies to find models that are both predictive and interpretable and demonstrate our approach on several data sets. Our human subjects results show trends towards different proxy notions of interpretability on different datasets, which suggests that different proxies are preferred on different tasks.

Cite

Text

Lage et al. "Human-in-the-Loop Interpretability Prior." Neural Information Processing Systems, 2018.

Markdown

[Lage et al. "Human-in-the-Loop Interpretability Prior." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/lage2018neurips-humanintheloop/)

BibTeX

@inproceedings{lage2018neurips-humanintheloop,
  title     = {{Human-in-the-Loop Interpretability Prior}},
  author    = {Lage, Isaac and Ross, Andrew and Gershman, Samuel J and Kim, Been and Doshi-Velez, Finale},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {10159-10168},
  url       = {https://mlanthology.org/neurips/2018/lage2018neurips-humanintheloop/}
}