Supervised Contrastive Learning for Pre-Trained Language Model Fine-Tuning

Abstract

State-of-the-art natural language understanding classification models follow two-stages: pre-training a large language model on an auxiliary task, and then fine-tuning the model on a task-specific labeled dataset using cross-entropy loss. However, the cross-entropy loss has several shortcomings that can lead to sub-optimal generalization and instability. Driven by the intuition that good generalization requires capturing the similarity between examples in one class and contrasting them with examples in other classes, we propose a supervised contrastive learning (SCL) objective for the fine-tuning stage. Combined with cross-entropy, our proposed SCL loss obtains significant improvements over a strong RoBERTa-Large baseline on multiple datasets of the GLUE benchmark in few-shot learning settings, without requiring specialized architecture, data augmentations, memory banks, or additional unsupervised data. Our proposed fine-tuning objective leads to models that are more robust to different levels of noise in the fine-tuning training data, and can generalize better to related tasks with limited labeled data.

Cite

Text

Gunel et al. "Supervised Contrastive Learning for Pre-Trained Language Model Fine-Tuning." International Conference on Learning Representations, 2021.

Markdown

[Gunel et al. "Supervised Contrastive Learning for Pre-Trained Language Model Fine-Tuning." International Conference on Learning Representations, 2021.](https://mlanthology.org/iclr/2021/gunel2021iclr-supervised/)

BibTeX

@inproceedings{gunel2021iclr-supervised,
  title     = {{Supervised Contrastive Learning for Pre-Trained Language Model Fine-Tuning}},
  author    = {Gunel, Beliz and Du, Jingfei and Conneau, Alexis and Stoyanov, Veselin},
  booktitle = {International Conference on Learning Representations},
  year      = {2021},
  url       = {https://mlanthology.org/iclr/2021/gunel2021iclr-supervised/}
}