TOCOL: Improving Contextual Representation of Pre-Trained Language Models via Token-Level Contrastive Learning

Abstract

Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TO ken-Level CO ntrastive L earning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.

Cite

Text

Wang et al. "TOCOL: Improving Contextual Representation of Pre-Trained Language Models via Token-Level Contrastive Learning." Machine Learning, 2024. doi:10.1007/S10994-023-06512-9

Markdown

[Wang et al. "TOCOL: Improving Contextual Representation of Pre-Trained Language Models via Token-Level Contrastive Learning." Machine Learning, 2024.](https://mlanthology.org/mlj/2024/wang2024mlj-tocol/) doi:10.1007/S10994-023-06512-9

BibTeX

@article{wang2024mlj-tocol,
  title     = {{TOCOL: Improving Contextual Representation of Pre-Trained Language Models via Token-Level Contrastive Learning}},
  author    = {Wang, Keheng and Yin, Chuantao and Li, Rumei and Wang, Sirui and Xian, Yunsen and Rong, Wenge and Xiong, Zhang},
  journal   = {Machine Learning},
  year      = {2024},
  pages     = {3999-4012},
  doi       = {10.1007/S10994-023-06512-9},
  volume    = {113},
  url       = {https://mlanthology.org/mlj/2024/wang2024mlj-tocol/}
}