Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety

Schmaltz, Allen; Rasooly, Danielle

Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety

NeurIPSW 2022

/neuripsw/2022/schmaltz2022neuripsw-introspection/

Abstract

When deploying Transformer networks, we seek the ability to introspect the predictions against instances with known labels; update the model without a full re-training; and provide reliable uncertainty quantification over the predictions. We demonstrate that these properties are achievable via recently proposed approaches for approximating deep neural networks with instance-based metric learners, at varying resolutions of the input, and the associated Venn-ADMIT Predictor for constructing prediction sets. We consider a challenging (but non-adversarial) task: Zero-shot sequence labeling (i.e., feature detection) in a low-accuracy, class-imbalanced, covariate-shifted setting while requiring a high confidence level.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Schmaltz and Rasooly. "Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety." NeurIPS 2022 Workshops: MLSW, 2022.

Markdown

[Schmaltz and Rasooly. "Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/schmaltz2022neuripsw-introspection/)

BibTeX

@inproceedings{schmaltz2022neuripsw-introspection,
  title     = {{Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety}},
  author    = {Schmaltz, Allen and Rasooly, Danielle},
  booktitle = {NeurIPS 2022 Workshops: MLSW},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/schmaltz2022neuripsw-introspection/}
}