Spark-Crowd: A Spark Package for Learning from Crowdsourced Big Data

Abstract

As the data sets increase in size, the process of manually labeling data becomes unfeasible by small groups of experts. Thus, it is common to rely on crowdsourcing platforms which provide inexpensive, but noisy, labels. Although implementations of algorithms to tackle this problem exist, none of them focus on scalability, limiting the area of application to relatively small data sets. In this paper, we present spark-crowd, an Apache Spark package for learning from crowdsourced data with scalability in mind.

Cite

Text

Rodrigo et al. "Spark-Crowd: A Spark Package for Learning from Crowdsourced Big Data." Machine Learning Open Source Software, 2019.

Markdown

[Rodrigo et al. "Spark-Crowd: A Spark Package for Learning from Crowdsourced Big Data." Machine Learning Open Source Software, 2019.](https://mlanthology.org/mloss/2019/rodrigo2019jmlr-sparkcrowd/)

BibTeX

@article{rodrigo2019jmlr-sparkcrowd,
  title     = {{Spark-Crowd: A Spark Package for Learning from Crowdsourced Big Data}},
  author    = {Rodrigo, Enrique G. and Aledo, Juan A. and Gámez, José A.},
  journal   = {Machine Learning Open Source Software},
  year      = {2019},
  pages     = {1-5},
  volume    = {20},
  url       = {https://mlanthology.org/mloss/2019/rodrigo2019jmlr-sparkcrowd/}
}