IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Abstract

Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.

Cite

Text

Zhou et al. "IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I16.17718

Markdown

[Zhou et al. "IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/zhou2021aaai-isobn/) doi:10.1609/AAAI.V35I16.17718

BibTeX

@inproceedings{zhou2021aaai-isobn,
  title     = {{IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization}},
  author    = {Zhou, Wenxuan and Lin, Bill Yuchen and Ren, Xiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {14621-14629},
  doi       = {10.1609/AAAI.V35I16.17718},
  url       = {https://mlanthology.org/aaai/2021/zhou2021aaai-isobn/}
}