Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion

Abstract

Supervised learning from multiple annotators is an increasingly important problem in machine leaning and data mining. This paper develops a probabilistic approach to this problem when annotators are not only unreliable, but also have varying performance depending on the data. The proposed approach uses a Gaussian mixture model (GMM) and Bayesian information criterion (BIC) to find the fittest model to approximate the distribution of the instances. Then the maximum a posterior (MAP) estimation of the hidden true labels and the maximum-likelihood (ML) estimation of quality of multiple annotators are provided alternately. Experiments on emotional speech classification and CASP9 protein disorder prediction tasks show performance improvement of the proposed approach as compared to the majority voting baseline and a previous data-independent approach. Moreover, the approach also provides more accurate estimates of individual annotators performance for each Gaussian component, thus paving the way for understanding the behaviors of each annotator.

Cite

Text

Zhang and Obradovic. "Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011. doi:10.1007/978-3-642-23808-6_36

Markdown

[Zhang and Obradovic. "Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.](https://mlanthology.org/ecmlpkdd/2011/zhang2011ecmlpkdd-learning/) doi:10.1007/978-3-642-23808-6_36

BibTeX

@inproceedings{zhang2011ecmlpkdd-learning,
  title     = {{Learning from Inconsistent and Unreliable Annotators by a Gaussian Mixture Model and Bayesian Information Criterion}},
  author    = {Zhang, Ping and Obradovic, Zoran},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2011},
  pages     = {553-568},
  doi       = {10.1007/978-3-642-23808-6_36},
  url       = {https://mlanthology.org/ecmlpkdd/2011/zhang2011ecmlpkdd-learning/}
}