Interpreting Intentionally Flawed Models with Linear Probes

Abstract

The representational differences between generalizing networks and intentionally flawed models can be insightful on the dynamics of network training. Do memorizing networks, e.g. networks that learn random label correspondences, focus on specific patterns in the data to memorize the labels? Are the features learned by a generalizing network affected by randomization of the model parameters? In high-risk applications such as medical, legal or financial domains, highlighting the representational differences that help generalization may be even more important than the model performance itself. In this paper, we probe the activations of intermediate layers with linear classification and regression. Results show that the bias towards simple solutions of generalizing networks is maintained even when statistical irregularities are intentionally introduced.

Cite

Text

Graziani et al. "Interpreting Intentionally Flawed Models with Linear Probes." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00096

Markdown

[Graziani et al. "Interpreting Intentionally Flawed Models with Linear Probes." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/graziani2019iccvw-interpreting/) doi:10.1109/ICCVW.2019.00096

BibTeX

@inproceedings{graziani2019iccvw-interpreting,
  title     = {{Interpreting Intentionally Flawed Models with Linear Probes}},
  author    = {Graziani, Mara and Müller, Henning and Andrearczyk, Vincent},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {743-747},
  doi       = {10.1109/ICCVW.2019.00096},
  url       = {https://mlanthology.org/iccvw/2019/graziani2019iccvw-interpreting/}
}