Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety
Abstract
When deploying Transformer networks, we seek the ability to introspect the predictions against instances with known labels; update the model without a full re-training; and provide reliable uncertainty quantification over the predictions. We demonstrate that these properties are achievable via recently proposed approaches for approximating deep neural networks with instance-based metric learners, at varying resolutions of the input, and the associated Venn-ADMIT Predictor for constructing prediction sets. We consider a challenging (but non-adversarial) task: Zero-shot sequence labeling (i.e., feature detection) in a low-accuracy, class-imbalanced, covariate-shifted setting while requiring a high confidence level.
Cite
Text
Schmaltz and Rasooly. "Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety." NeurIPS 2022 Workshops: MLSW, 2022.Markdown
[Schmaltz and Rasooly. "Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/schmaltz2022neuripsw-introspection/)BibTeX
@inproceedings{schmaltz2022neuripsw-introspection,
title = {{Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety}},
author = {Schmaltz, Allen and Rasooly, Danielle},
booktitle = {NeurIPS 2022 Workshops: MLSW},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/schmaltz2022neuripsw-introspection/}
}