A Generalization Theory for Zero-Shot Prediction
Abstract
A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be used for prediction on a downstream task for which no labeled data is available. We present a theoretical framework to better understand this approach, called zero-shot prediction. We identify the target quantities that zero-shot prediction aims to learn, or learns in passing, and the key conditional independence relationships that enable its generalization ability.
Cite
Text
Mehta and Harchaoui. "A Generalization Theory for Zero-Shot Prediction." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Mehta and Harchaoui. "A Generalization Theory for Zero-Shot Prediction." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/mehta2025icml-generalization/)BibTeX
@inproceedings{mehta2025icml-generalization,
title = {{A Generalization Theory for Zero-Shot Prediction}},
author = {Mehta, Ronak and Harchaoui, Zaid},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {43603-43660},
volume = {267},
url = {https://mlanthology.org/icml/2025/mehta2025icml-generalization/}
}