Cost-Sensitive Semi-Supervised Support Vector Machine

Abstract

In this paper, we study cost-sensitive semi-supervised learning where many of the training examples are unlabeled and different misclassification errors are associated with unequal costs. This scenario occurs in many real-world applications. For example, in some disease diagnosis, the cost of erroneously diagnosing a patient as healthy is much higher than that of diagnosing a healthy person as a patient. Also, the acquisition of labeled data requires medical diagnosis which is expensive, while the collection of unlabeled data such as basic health information is much cheaper. We propose the CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) to address this problem. We show that the CS4VM, when given the label means of the unlabeled data, closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data. This observation leads to an efficient algorithm which first estimates the label means and then trains the CS4VM with the plug-in label means by an efficient SVM solver. Experiments on a broad range of data sets show that the proposed method is capable of reducing the total cost and is computationally efficient.

Cite

Text

Li et al. "Cost-Sensitive Semi-Supervised Support Vector Machine." AAAI Conference on Artificial Intelligence, 2010. doi:10.1609/AAAI.V24I1.7661

Markdown

[Li et al. "Cost-Sensitive Semi-Supervised Support Vector Machine." AAAI Conference on Artificial Intelligence, 2010.](https://mlanthology.org/aaai/2010/li2010aaai-cost/) doi:10.1609/AAAI.V24I1.7661

BibTeX

@inproceedings{li2010aaai-cost,
  title     = {{Cost-Sensitive Semi-Supervised Support Vector Machine}},
  author    = {Li, Yufeng and Kwok, James T. and Zhou, Zhi-Hua},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2010},
  pages     = {500-505},
  doi       = {10.1609/AAAI.V24I1.7661},
  url       = {https://mlanthology.org/aaai/2010/li2010aaai-cost/}
}