SPADE: A Semi-Supervised Probabilistic Approach for Detecting Errors in Tables

Minh Pham, Craig A. Knoblock, Muhao Chen, Binh Vu, Jay Pujara

IJCAI 2021 pp. 3543-3551

doi:10.24963/IJCAI.2021/488 /ijcai/2021/pham2021ijcai-spade/

Abstract

Error detection is one of the most important steps in data cleaning and usually requires extensive human interaction to ensure quality. Existing supervised methods in error detection require a significant amount of training data while unsupervised methods rely on fixed inductive biases, which are usually hard to generalize, to solve the problem. In this paper, we present SPADE, a novel semi-supervised probabilistic approach for error detection. SPADE introduces a novel probabilistic active learning model, where the system suggests examples to be labeled based on the agreements between user labels and indicative signals, which are designed to capture potential errors. SPADE uses a two-phase data augmentation process to enrich a dataset before training a deep learning classifier to detect unlabeled errors. In our evaluation, SPADE achieves an average F1-score of 0.91 over five datasets and yields a 10% improvement compared with the state-of-the-art systems.

PDF IJCAI Semantic Scholar

Cite

Text

Pham et al. "SPADE: A Semi-Supervised Probabilistic Approach for Detecting Errors in Tables." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/488

Markdown

[Pham et al. "SPADE: A Semi-Supervised Probabilistic Approach for Detecting Errors in Tables." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/pham2021ijcai-spade/) doi:10.24963/IJCAI.2021/488

BibTeX

@inproceedings{pham2021ijcai-spade,
  title     = {{SPADE: A Semi-Supervised Probabilistic Approach for Detecting Errors in Tables}},
  author    = {Pham, Minh and Knoblock, Craig A. and Chen, Muhao and Vu, Binh and Pujara, Jay},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {3543-3551},
  doi       = {10.24963/IJCAI.2021/488},
  url       = {https://mlanthology.org/ijcai/2021/pham2021ijcai-spade/}
}