A Correlated Worker Model for Grouped, Imbalanced and Multitask Data

Abstract

We consider the important crowdsourcing problem of estimating worker confusion matrices, or sensitivities and specificities for binary classification tasks. In addition to providing diagnostic insights into worker performance, such estimates enable robust online task routing for classification tasks exhibiting imbalance and asymmetric costs. However, labeled data is often expensive and hence estimates must be made without much of it. This poses a challenge to existing methods. In this paper, we propose a novel model that captures the correlations between entries in confusion matrices. We applied this model in two practical scenarios: (1) an imbalanced classification task in which workers are known to belong to groups and (2) a multitask scenario in which labels for the same workers are available in more than one labeling task. We derive an efficient variational inference approach that scales to large datasets. Experiments on two real world citizen science datasets (biomedical citation screening and galaxy morphological classification) demonstrate consistent improvement over competitive baselines. We have made our source code available.

Cite

Text

Nguyen et al. "A Correlated Worker Model for Grouped, Imbalanced and Multitask Data." Conference on Uncertainty in Artificial Intelligence, 2016.

Markdown

[Nguyen et al. "A Correlated Worker Model for Grouped, Imbalanced and Multitask Data." Conference on Uncertainty in Artificial Intelligence, 2016.](https://mlanthology.org/uai/2016/nguyen2016uai-correlated/)

BibTeX

@inproceedings{nguyen2016uai-correlated,
  title     = {{A Correlated Worker Model for Grouped, Imbalanced and Multitask Data}},
  author    = {Nguyen, An T. and Wallace, Byron C. and Lease, Matthew},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2016},
  url       = {https://mlanthology.org/uai/2016/nguyen2016uai-correlated/}
}