Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels
Abstract
The high cost of data labeling presents a major barrier to deploying machine learning systems at scale. Semi-supervised learning (SSL) mitigates this challenge by utilizing unlabeled data alongside limited labeled examples, while the emergence of foundation models (FMs) offers powerful zero-shot capabilities that can further reduce labeling cost. However, directly fine-tuning large FMs is often impractical in resource-constrained settings, and naïvely using their pseudo-labels for unlabeled data can degrade performance due to its unreliablity or domain mismatch with target task. In this work, we introduce ZeroMatch, a novel SSL framework that integrates knowledge distillation with consistency-based learning to jointly leverage labeled data, unlabeled data, and pseudo-labels from FMs. ZeroMatch trains a compact student model and access FMs only through inference services, making it suitable for low-resource environments such as personal devices with limited compute. Experiments on six vision and language classification benchmarks show that ZeroMatch consistently outperforms standard SSL and zero-shot augmented methods, demonstrating its effectiveness and robustness across a range of foundation model qualities.
Cite
Text
Chung and Chen. "Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels." Transactions on Machine Learning Research, 2026.Markdown
[Chung and Chen. "Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/chung2026tmlr-enhancing/)BibTeX
@article{chung2026tmlr-enhancing,
title = {{Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels}},
author = {Chung, Jichan and Chen, Irene Y.},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/chung2026tmlr-enhancing/}
}