The Building Blocks of Interpretability

Olah, Chris; Satyanarayan, Arvind; Johnson, Ian; Carter, Shan; Schubert, Ludwig; Ye, Katherine; Mordvintsev, Alexander

doi:10.23915/distill.00010

The Building Blocks of Interpretability

Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, Alexander Mordvintsev

Distill 2018

doi:10.23915/distill.00010 /distill/2018/olah2018distill-building/

Abstract

Distill articles are interactive publications and do not include traditional abstracts. This summary was written for the ML Anthology. Presents a framework for composing interpretability techniques as modular building blocks, enabling the construction of rich interfaces that reveal what neural networks detect and how they build up understanding across layers.

Distill Code Semantic Scholar

Cite

Text

Olah et al. "The Building Blocks of Interpretability." Distill, 2018. doi:10.23915/distill.00010

Markdown

[Olah et al. "The Building Blocks of Interpretability." Distill, 2018.](https://mlanthology.org/distill/2018/olah2018distill-building/) doi:10.23915/distill.00010

BibTeX

@article{olah2018distill-building,
  title     = {{The Building Blocks of Interpretability}},
  author    = {Olah, Chris and Satyanarayan, Arvind and Johnson, Ian and Carter, Shan and Schubert, Ludwig and Ye, Katherine and Mordvintsev, Alexander},
  journal   = {Distill},
  year      = {2018},
  doi       = {10.23915/distill.00010},
  url       = {https://mlanthology.org/distill/2018/olah2018distill-building/}
}