Crowdclustering with Partition Labels

Chen, Junxiang; Chang, Yale; Castaldi, Peter J.; Cho, Michael H.; Hobbs, Brian D.; Dy, Jennifer G.

Crowdclustering with Partition Labels

Junxiang Chen, Yale Chang, Peter J. Castaldi, Michael H. Cho, Brian D. Hobbs, Jennifer G. Dy

AISTATS 2018 pp. 1127-1136

/aistats/2018/chen2018aistats-crowdclustering/

Abstract

Crowdclustering is a practical way to incorporate domain knowledge into clustering, by combining opinions from multiple domain experts. Existing crowdclustering methods analyze binary pairwise similarity labels. However, in some applications, experts might provide partition labels. If we convert partition labels into pairwise similarity, then it would be difficult to understand the relationships between clustering solutions from different experts. In this paper, we propose a crowdclustering model that directly analyzes partition labels. The proposed model adopts a novel approach based on a modified multinomial logistic regression model, which simultaneously learns the number of clusters and determines hyper-planes that partition samples into clusters. The proposed model also learns a mapping between the latent clusters and expert labels, revealing the agreements and disagreements between experts. Experiments on benchmark data demonstrate that the proposed model simultaneously learns the number of clusters and discovers the clustering structure. An experiment on disease subtyping problem illustrates that the proposed model helps us understand the agreement and disagreement between experts.

PDF AISTATS Semantic Scholar

Cite

Text

Chen et al. "Crowdclustering with Partition Labels." International Conference on Artificial Intelligence and Statistics, 2018.

Markdown

[Chen et al. "Crowdclustering with Partition Labels." International Conference on Artificial Intelligence and Statistics, 2018.](https://mlanthology.org/aistats/2018/chen2018aistats-crowdclustering/)

BibTeX

@inproceedings{chen2018aistats-crowdclustering,
  title     = {{Crowdclustering with Partition Labels}},
  author    = {Chen, Junxiang and Chang, Yale and Castaldi, Peter J. and Cho, Michael H. and Hobbs, Brian D. and Dy, Jennifer G.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2018},
  pages     = {1127-1136},
  url       = {https://mlanthology.org/aistats/2018/chen2018aistats-crowdclustering/}
}