Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification

Abstract

Learning an effective representation in multi-label text classification (MLTC) is a significant challenge in natural language processing. This challenge arises from the inherent complexity of the task, which is shaped by two key factors: the intricate connections between labels and the widespread long-tailed distribution of the data. To overcome this issue, one potential approach involves integrating supervised contrastive learning with classical supervised loss functions. Although contrastive learning has shown remarkable performance in multi-class classification, its impact in the multi-label framework has not been thoroughly investigated. In this paper, we conduct an in-depth study of supervised contrastive learning and its influence on representation in MLTC context. We emphasize the importance of considering long-tailed data distributions to build a robust representation space, and we identify two critical challenges associated with contrastive learning: the “lack of positives” and the “attraction-repulsion imbalance”. Building on these insights, we introduce a novel contrastive loss function for MLTC. It attains Micro-F1 scores that either match or surpass those obtained with other frequently employed loss functions, and demonstrates a significant improvement in Macro-F1 scores across four multi-label datasets.

Cite

Text

Audibert et al. "Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70368-3_15

Markdown

[Audibert et al. "Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/audibert2024ecmlpkdd-exploring/) doi:10.1007/978-3-031-70368-3_15

BibTeX

@inproceedings{audibert2024ecmlpkdd-exploring,
  title     = {{Exploring Contrastive Learning for Long-Tailed Multi-Label Text Classification}},
  author    = {Audibert, Alexandre and Gauffre, Aurélien and Amini, Massih-Reza},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {245-261},
  doi       = {10.1007/978-3-031-70368-3_15},
  url       = {https://mlanthology.org/ecmlpkdd/2024/audibert2024ecmlpkdd-exploring/}
}