A Generalization Theory for Zero-Shot Prediction

Abstract

A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be used for prediction on a downstream task for which no labeled data is available. We present a theoretical framework to better understand this approach, called zero-shot prediction. We identify the target quantities that zero-shot prediction aims to learn, or learns in passing, and the key conditional independence relationships that enable its generalization ability.

Cite

Text

Mehta and Harchaoui. "A Generalization Theory for Zero-Shot Prediction." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Mehta and Harchaoui. "A Generalization Theory for Zero-Shot Prediction." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/mehta2025icml-generalization/)

BibTeX

@inproceedings{mehta2025icml-generalization,
  title     = {{A Generalization Theory for Zero-Shot Prediction}},
  author    = {Mehta, Ronak and Harchaoui, Zaid},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {43603-43660},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/mehta2025icml-generalization/}
}