Semi-Supervised Learning with Penalized Probabilistic Clustering

Abstract

While clustering is usually an unsupervised operation, there are circum- stances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is proba- bilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the prefer- ences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

Cite

Text

Lu and Leen. "Semi-Supervised Learning with Penalized Probabilistic Clustering." Neural Information Processing Systems, 2004.

Markdown

[Lu and Leen. "Semi-Supervised Learning with Penalized Probabilistic Clustering." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/lu2004neurips-semisupervised/)

BibTeX

@inproceedings{lu2004neurips-semisupervised,
  title     = {{Semi-Supervised Learning with Penalized Probabilistic Clustering}},
  author    = {Lu, Zhengdong and Leen, Todd K.},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {849-856},
  url       = {https://mlanthology.org/neurips/2004/lu2004neurips-semisupervised/}
}