Active Learning from Crowds

Abstract

Obtaining labels is expensive or time-consuming, but unlabeled data is often abundant and easy to obtain. Many learning task can profit from intelligently choosing unlabeled instances to be labeled by an oracle also known as active learning, instead of simply labeling all the data or randomly selecting data to be labeled. Supervised learning traditionally relies on an oracle playing the role of a teacher. In the multiple annotator paradigm, an oracle, who knows the ground truth, no longer exists; instead, multiple labelers, with varying expertise, are available for querying. This paradigm posits new challenges to the active learning scenario. We can ask which data sample should be labeled next and which annotator should we query to benefit our learning model the most. In this paper, we develop a probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate (or inaccurate) across the task domain. In addition, we provide an optimization formulation that allows us to simultaneously learn the most uncertain sample and the annotator/s to query the labels from for active learning. Our active learning approach combines both intelligently selecting samples to label and learning from expertise among multiple labelers to improve learning performance.

Cite

Text

Yan et al. "Active Learning from Crowds." International Conference on Machine Learning, 2011.

Markdown

[Yan et al. "Active Learning from Crowds." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/yan2011icml-active/)

BibTeX

@inproceedings{yan2011icml-active,
  title     = {{Active Learning from Crowds}},
  author    = {Yan, Yan and Rosales, Rómer and Fung, Glenn and Dy, Jennifer G.},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {1161-1168},
  url       = {https://mlanthology.org/icml/2011/yan2011icml-active/}
}