Experts Don't Cheat: Learning What You Don't Know by Predicting Pairs

Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison

ICLRW 2024

/iclrw/2024/johnson2024iclrw-experts/

Abstract

Identifying how much a model $\hat{p}\_{\scriptscriptstyle{Y|X}}^{\theta}$ knows about the stochastic real-world process $p\_{\scriptscriptstyle{Y|X}}$ it was trained on is important to ensure it avoids producing "hallucinated" answers or taking unsafe actions, but this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty). We propose a general strategy for decomposing these: train a model to predict *pairs* of independent responses drawn from the true distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. We prove that this strategy incentivizes models to become *second-order calibrated*, which allows you to both accurately estimate the gaps between $\hat{p}\_{\scriptscriptstyle{Y|X}}^{\theta}$ and also$p\_{\scriptscriptstyle{Y|X}}$ and construct decoding algorithms with bounded probability of generating an incorrect statement. Empirically, we show that our strategy outperforms other filtering methods on a synthetic language modeling task (describing digits of $\pi$).

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Johnson et al. "Experts Don't Cheat: Learning What You Don't Know by Predicting Pairs." ICLR 2024 Workshops: R2-FM, 2024.

Markdown

[Johnson et al. "Experts Don't Cheat: Learning What You Don't Know by Predicting Pairs." ICLR 2024 Workshops: R2-FM, 2024.](https://mlanthology.org/iclrw/2024/johnson2024iclrw-experts/)

BibTeX

@inproceedings{johnson2024iclrw-experts,
  title     = {{Experts Don't Cheat: Learning What You Don't Know by Predicting Pairs}},
  author    = {Johnson, Daniel D. and Tarlow, Daniel and Duvenaud, David and Maddison, Chris J.},
  booktitle = {ICLR 2024 Workshops: R2-FM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/johnson2024iclrw-experts/}
}