Generalization Guarantees of Self-Training of Halfspaces Under Label Noise Corruption

Lies Hadjadj, Massih-Reza Amini, Sana Louhichi

IJCAI 2023 pp. 3777-3785

doi:10.24963/IJCAI.2023/420 /ijcai/2023/hadjadj2023ijcai-generalization/

Abstract

We investigate the generalization properties of a self-training algorithm with halfspaces. The approach learns a list of halfspaces iteratively from labeled and unlabeled training data, in which each iteration consists of two steps: exploration and pruning. In the exploration phase, the halfspace is found sequentially by maximizing the unsigned-margin among unlabeled examples and then assigning pseudo-labels to those that have a distance higher than the current threshold. These pseudo-labels are allegedly corrupted by noise. The training set is then augmented with noisy pseudo-labeled examples, and a new classifier is trained. This process is repeated until no more unlabeled examples remain for pseudo-labeling. In the pruning phase, pseudo-labeled samples that have a distance to the last halfspace greater than the associated unsigned-margin are then discarded. We prove that the misclassification error of the resulting sequence of classifiers is bounded and show that the resulting semi-supervised approach never degrades performance compared to the classifier learned using only the initial labeled training set. Experiments carried out on a variety of benchmarks demonstrate the efficiency of the proposed approach compared to state-of-the-art methods.

PDF IJCAI Semantic Scholar

Cite

Text

Hadjadj et al. "Generalization Guarantees of Self-Training of Halfspaces Under Label Noise Corruption." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/420

Markdown

[Hadjadj et al. "Generalization Guarantees of Self-Training of Halfspaces Under Label Noise Corruption." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/hadjadj2023ijcai-generalization/) doi:10.24963/IJCAI.2023/420

BibTeX

@inproceedings{hadjadj2023ijcai-generalization,
  title     = {{Generalization Guarantees of Self-Training of Halfspaces Under Label Noise Corruption}},
  author    = {Hadjadj, Lies and Amini, Massih-Reza and Louhichi, Sana},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {3777-3785},
  doi       = {10.24963/IJCAI.2023/420},
  url       = {https://mlanthology.org/ijcai/2023/hadjadj2023ijcai-generalization/}
}