In All Likelihoods: Robust Selection of Pseudo-Labeled Data
Abstract
Self-training is a simple yet effective method within semi-supervised learning. Self-training’s rationale is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these pseudo-labeled data (PLS). In this paper, we render PLS more robust towards the involved modeling assumptions. To this end, we treat PLS as a decision problem, which allows us to introduce a generalized utility function. The idea is to select pseudo-labeled data that maximize a multi-objective utility function. We demonstrate that the latter can be constructed to account for different sources of uncertainty and explore three examples: model selection, accumulation of errors and covariate shift. In the absence of second-order information on such uncertainties, we furthermore consider the generic approach of the generalized Bayesian $\alpha$-cut updating rule for credal sets. We spotlight the application of three of our robust extensions on both simulated and three real-world data sets. In a benchmarking study, we compare these extensions to traditional PLS methods. Results suggest that robustness with regard to model choice can lead to substantial accuracy gains.
Cite
Text
Rodemann et al. "In All Likelihoods: Robust Selection of Pseudo-Labeled Data." Proceedings of the Thirteenth International Symposium on Imprecise Probability: Theories and Applications, 2023.Markdown
[Rodemann et al. "In All Likelihoods: Robust Selection of Pseudo-Labeled Data." Proceedings of the Thirteenth International Symposium on Imprecise Probability: Theories and Applications, 2023.](https://mlanthology.org/isipta/2023/rodemann2023isipta-all/)BibTeX
@inproceedings{rodemann2023isipta-all,
title = {{In All Likelihoods: Robust Selection of Pseudo-Labeled Data}},
author = {Rodemann, Julian and Jansen, Christoph and Schollmeyer, Georg and Augustin, Thomas},
booktitle = {Proceedings of the Thirteenth International Symposium on Imprecise Probability: Theories and Applications},
year = {2023},
pages = {412-425},
volume = {215},
url = {https://mlanthology.org/isipta/2023/rodemann2023isipta-all/}
}