Contrast-Enhanced Semi-Supervised Text Classification with Few Labels

Abstract

Traditional text classification requires thousands of annotated data or an additional Neural Machine Translation (NMT) system, which are expensive to obtain in real applications. This paper presents a Contrast-Enhanced Semi-supervised Text Classification (CEST) framework under label-limited settings without incorporating any NMT systems. We propose a certainty-driven sample selection method and a contrast-enhanced similarity graph to utilize data more efficiently in self-training, alleviating the annotation-starving problem. The graph imposes a smoothness constraint on the unlabeled data to improve the coherence and the accuracy of pseudo-labels. Moreover, CEST formulates the training as a “learning from noisy labels” problem and performs the optimization accordingly. A salient feature of this formulation is the explicit suppression of the severe error propagation problem in conventional semi-supervised learning. With solely 30 labeled data per class for both training and validation dataset, CEST outperforms the previous state-of-the-art algorithms by 2.11% accuracy and only falls within the 3.04% accuracy range of fully-supervised pre-training language model fine-tuning on thousands of labeled data.

Cite

Text

Tsai et al. "Contrast-Enhanced Semi-Supervised Text Classification with Few Labels." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I10.21391

Markdown

[Tsai et al. "Contrast-Enhanced Semi-Supervised Text Classification with Few Labels." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/tsai2022aaai-contrast/) doi:10.1609/AAAI.V36I10.21391

BibTeX

@inproceedings{tsai2022aaai-contrast,
  title     = {{Contrast-Enhanced Semi-Supervised Text Classification with Few Labels}},
  author    = {Tsai, Austin Cheng-Yun and Lin, Sheng-Ya and Fu, Li-Chen},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {11394-11402},
  doi       = {10.1609/AAAI.V36I10.21391},
  url       = {https://mlanthology.org/aaai/2022/tsai2022aaai-contrast/}
}