Models That Prove Their Own Correctness

Noga Amit, Shafi Goldwasser, Orr Paradise, Guy N. Rothblum

ICMLW 2024

/icmlw/2024/amit2024icmlw-models-b/

Abstract

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. As an empirical exploration, our learning method is used to train a Self-Proving transformer that computes the Greatest Common Divisor (GCD) *and* proves the correctness of its answer. Our code is available [here](https://github.com/orrp/self-proving-models).

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Amit et al. "Models That Prove Their Own Correctness." ICML 2024 Workshops: TF2M, 2024.

Markdown

[Amit et al. "Models That Prove Their Own Correctness." ICML 2024 Workshops: TF2M, 2024.](https://mlanthology.org/icmlw/2024/amit2024icmlw-models-b/)

BibTeX

@inproceedings{amit2024icmlw-models-b,
  title     = {{Models That Prove Their Own Correctness}},
  author    = {Amit, Noga and Goldwasser, Shafi and Paradise, Orr and Rothblum, Guy N.},
  booktitle = {ICML 2024 Workshops: TF2M},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/amit2024icmlw-models-b/}
}