SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)

Abstract

This paper presents SimCTC, a simple contrastive learning (CL) framework that greatly advances the state-of-the-art text clustering models. In SimCTC, a pre-trained BERT model first maps the input sequence to the representation space, which is then followed by three different loss function heads: Clustering head, Instance-CL head and Cluster-CL head. Experimental results on multiple benchmark datasets demonstrate that SimCTC remarkably outperforms 6 competitive text clustering methods with 1%-6% improvement on Accuracy (ACC) and 1%-4% improvement on Normalized Mutual Information (NMI). Moreover, our results also show that the clustering performance can be further improved by setting an appropriate number of clusters in the cluster-level objective.

Cite

Text

Li et al. "SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I11.21635

Markdown

[Li et al. "SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/li2022aaai-simctc/) doi:10.1609/AAAI.V36I11.21635

BibTeX

@inproceedings{li2022aaai-simctc,
  title     = {{SimCTC: A Simple Contrast Learning Method of Text Clustering (Student Abstract)}},
  author    = {Li, Chen and Yu, Xiaoguang and Song, Shuangyong and Wang, Jia and Zou, Bo and He, Xiaodong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {12997-12998},
  doi       = {10.1609/AAAI.V36I11.21635},
  url       = {https://mlanthology.org/aaai/2022/li2022aaai-simctc/}
}