A PAC-Style Model for Learning from Labeled and Unlabeled Data

Abstract

There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of different approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PAC-style framework that can be used to model many of these assumptions, and analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C , one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution. In this view, unlabeled data can be helpful because it allows one to estimate compatibility over the space of hypotheses, and reduce the size of the search space to those that, according to one’s assumptions, are a-priori reasonable with respect to the distribution. We discuss a number of technical issues that arise in this context, and provide sample-complexity bounds both for uniform convergence and ε -cover based algorithms. We also consider algorithmic issues, and give an efficient algorithm for a special case of co-training.

Cite

Text

Balcan and Blum. "A PAC-Style Model for Learning from Labeled and Unlabeled Data." Annual Conference on Computational Learning Theory, 2005. doi:10.1007/11503415_8

Markdown

[Balcan and Blum. "A PAC-Style Model for Learning from Labeled and Unlabeled Data." Annual Conference on Computational Learning Theory, 2005.](https://mlanthology.org/colt/2005/balcan2005colt-pac/) doi:10.1007/11503415_8

BibTeX

@inproceedings{balcan2005colt-pac,
  title     = {{A PAC-Style Model for Learning from Labeled and Unlabeled Data}},
  author    = {Balcan, Maria-Florina and Blum, Avrim},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2005},
  pages     = {111-126},
  doi       = {10.1007/11503415_8},
  url       = {https://mlanthology.org/colt/2005/balcan2005colt-pac/}
}