Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types

Abstract

The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups such that a worker has the same reliability for all tasks within a group. Our analysis reveals a separability condition such that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

Cite

Text

Mandal et al. "Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types." Transactions on Machine Learning Research, 2025.

Markdown

[Mandal et al. "Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/mandal2025tmlr-spectral/)

BibTeX

@article{mandal2025tmlr-spectral,
  title     = {{Spectral Clustering and Labeling for Crowdsourcing with Inherently Distinct Task Types}},
  author    = {Mandal, Saptarshi and Kong, Seo Taek and Katselis, Dimitrios and Srikant, R.},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/mandal2025tmlr-spectral/}
}