Selective Sampling with Co-Testing: Preliminary Results

Abstract

We present a novel approach to selective sampling, cotesting, which can be applied to problems with redundant views (i.e., problems with multiple disjoint sets of attributes that can be used for learning). The main idea behind co-testing consists of selecting the queries among the unlabeled examples on which the existing views disagree. Selective sampling (Seung, Opper, & Sompolinski 1972), a form of active learning, reduces the number of training examples that need to be labeled by examining unlabeled examples and selecting the most informative ones for the human to label. We introduce co-testing, which is a novel approach to selective sampling for domains with redundant views. A domain has redundant views if there are at least two mutually exclusive sets of features that can be used to learn the target concept. Our work was inspired by (Blum & Mitchell 1998), who noted that there are many real world domains with multiple views. For example, in Web page classification, one can identify faculty home pages either based on the words on the page or based on the words in HTML anchors pointing to the page. Active learning algorithms ask the user to label an example that maximizes the information conveyed to the learner (we refer to such selected examples as ). In a standard, single-view learning scenario, this generally translates into finding an example that splits the version space in half, i.e., eliminating half of the hypotheses consistent with the training set. With redundant views, we can do much better. Co-testing simultaneously trains a separate classifier for each redundant view. Each classifier is applied to a pool of unlabeled examples, and the system selects a query based on the degree of disagreement among the learners. As the target hypotheses in each view must agree, co-testing can reduce the hypothesis space faster than would otherwise be possible. To illustrate this, consider a learning problem where we have two views, A and B. For illustrative purposes, imagine an extreme case where there is an unlabeled example that is classified as positive by a single hypothesis from the A version space; furthermore, assume that is classified as positive by all but one of the hypotheses from the B version space. If the

Cite

Text

Muslea et al. "Selective Sampling with Co-Testing: Preliminary Results." AAAI Conference on Artificial Intelligence, 2000.

Markdown

[Muslea et al. "Selective Sampling with Co-Testing: Preliminary Results." AAAI Conference on Artificial Intelligence, 2000.](https://mlanthology.org/aaai/2000/muslea2000aaai-selective-a/)

BibTeX

@inproceedings{muslea2000aaai-selective-a,
  title     = {{Selective Sampling with Co-Testing: Preliminary Results}},
  author    = {Muslea, Ion and Minton, Steven and Knoblock, Craig A.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2000},
  pages     = {1107},
  url       = {https://mlanthology.org/aaai/2000/muslea2000aaai-selective-a/}
}