Crowdclustering with Partition Labels

Abstract

Crowdclustering is a practical way to incorporate domain knowledge into clustering, by combining opinions from multiple domain experts. Existing crowdclustering methods analyze binary pairwise similarity labels. However, in some applications, experts might provide partition labels. If we convert partition labels into pairwise similarity, then it would be difficult to understand the relationships between clustering solutions from different experts. In this paper, we propose a crowdclustering model that directly analyzes partition labels. The proposed model adopts a novel approach based on a modified multinomial logistic regression model, which simultaneously learns the number of clusters and determines hyper-planes that partition samples into clusters. The proposed model also learns a mapping between the latent clusters and expert labels, revealing the agreements and disagreements between experts. Experiments on benchmark data demonstrate that the proposed model simultaneously learns the number of clusters and discovers the clustering structure. An experiment on disease subtyping problem illustrates that the proposed model helps us understand the agreement and disagreement between experts.

Cite

Text

Chen et al. "Crowdclustering with Partition Labels." International Conference on Artificial Intelligence and Statistics, 2018.

Markdown

[Chen et al. "Crowdclustering with Partition Labels." International Conference on Artificial Intelligence and Statistics, 2018.](https://mlanthology.org/aistats/2018/chen2018aistats-crowdclustering/)

BibTeX

@inproceedings{chen2018aistats-crowdclustering,
  title     = {{Crowdclustering with Partition Labels}},
  author    = {Chen, Junxiang and Chang, Yale and Castaldi, Peter J. and Cho, Michael H. and Hobbs, Brian D. and Dy, Jennifer G.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2018},
  pages     = {1127-1136},
  url       = {https://mlanthology.org/aistats/2018/chen2018aistats-crowdclustering/}
}