A Theoretical Characterization of Semi-Supervised Learning with Self-Training for Gaussian Mixture Models
Abstract
Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithms generate pseudo-labels for the unlabeled examples and progressively refine these pseudo-labels which hopefully coincides with the actual labels. This work provides theoretical insights into self-training algorithms with a focus on linear classifiers. First, we provide a sample complexity analysis for Gaussian mixture models with two components. This is established by sharp non-asymptotic characterization of the self-training iterations which captures the evolution of the model accuracy in terms of a fixed-point iteration. Our analysis reveals the provable benefits of rejecting samples with low confidence and demonstrates how self-training iterations can gracefully improve the model accuracy. Secondly, we study a generalized GMM where the component means follow a distribution. We demonstrate that ridge regularization and class margin (i.e. separation between the component means) is crucial for the success and lack of regularization may prevent self-training from identifying the core features in the data.
Cite
Text
Oymak and Cihad Gulcu. "A Theoretical Characterization of Semi-Supervised Learning with Self-Training for Gaussian Mixture Models." Artificial Intelligence and Statistics, 2021.Markdown
[Oymak and Cihad Gulcu. "A Theoretical Characterization of Semi-Supervised Learning with Self-Training for Gaussian Mixture Models." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/oymak2021aistats-theoretical/)BibTeX
@inproceedings{oymak2021aistats-theoretical,
title = {{A Theoretical Characterization of Semi-Supervised Learning with Self-Training for Gaussian Mixture Models}},
author = {Oymak, Samet and Cihad Gulcu, Talha},
booktitle = {Artificial Intelligence and Statistics},
year = {2021},
pages = {3601-3609},
volume = {130},
url = {https://mlanthology.org/aistats/2021/oymak2021aistats-theoretical/}
}