Semi-Crowdsourced Clustering with Deep Generative Models

Abstract

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.

Cite

Text

Luo et al. "Semi-Crowdsourced Clustering with Deep Generative Models." Neural Information Processing Systems, 2018.

Markdown

[Luo et al. "Semi-Crowdsourced Clustering with Deep Generative Models." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/luo2018neurips-semicrowdsourced/)

BibTeX

@inproceedings{luo2018neurips-semicrowdsourced,
  title     = {{Semi-Crowdsourced Clustering with Deep Generative Models}},
  author    = {Luo, Yucen and Tian, Tian and Shi, Jiaxin and Zhu, Jun and Zhang, Bo},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {3212-3222},
  url       = {https://mlanthology.org/neurips/2018/luo2018neurips-semicrowdsourced/}
}