TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings

Abstract

This paper revisits the problem of analyzing multiple ratings given by different judges. Different from previous work that focuses on distilling the true labels from noisy crowdsourcing ratings, we emphasize gaining diagnostic insights into our in-house well-trained judges. We generalize the well-known DAWIDSKENE model (Dawid & Skene, 1979) to a spectrum of probabilistic models under the same "TrueLabel + Confusion" paradigm, and show that our proposed hierarchical Bayesian model, called HYBRIDCONFUSION, consistently outperforms DAWIDSKENE on both synthetic and real-world data sets.

Cite

Text

Liu and Wang. "TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings." International Conference on Machine Learning, 2012.

Markdown

[Liu and Wang. "TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings." International Conference on Machine Learning, 2012.](https://mlanthology.org/icml/2012/liu2012icml-truelabel/)

BibTeX

@inproceedings{liu2012icml-truelabel,
  title     = {{TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings}},
  author    = {Liu, Chao and Wang, Yi-Min},
  booktitle = {International Conference on Machine Learning},
  year      = {2012},
  url       = {https://mlanthology.org/icml/2012/liu2012icml-truelabel/}
}