Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels

Abstract

The high cost of data labeling presents a major barrier to deploying machine learning systems at scale. Semi-supervised learning (SSL) mitigates this challenge by utilizing unlabeled data alongside limited labeled examples, while the emergence of foundation models (FMs) offers powerful zero-shot capabilities that can further reduce labeling cost. However, directly fine-tuning large FMs is often impractical in resource-constrained settings, and naïvely using their pseudo-labels for unlabeled data can degrade performance due to its unreliablity or domain mismatch with target task. In this work, we introduce ZeroMatch, a novel SSL framework that integrates knowledge distillation with consistency-based learning to jointly leverage labeled data, unlabeled data, and pseudo-labels from FMs. ZeroMatch trains a compact student model and access FMs only through inference services, making it suitable for low-resource environments such as personal devices with limited compute. Experiments on six vision and language classification benchmarks show that ZeroMatch consistently outperforms standard SSL and zero-shot augmented methods, demonstrating its effectiveness and robustness across a range of foundation model qualities.

Cite

Text

Chung and Chen. "Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels." Transactions on Machine Learning Research, 2026.

Markdown

[Chung and Chen. "Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/chung2026tmlr-enhancing/)

BibTeX

@article{chung2026tmlr-enhancing,
  title     = {{Enhancing Semi-Supervised Learning with Zero-Shot Pseudolabels}},
  author    = {Chung, Jichan and Chen, Irene Y.},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/chung2026tmlr-enhancing/}
}