IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Abstract
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.
Cite
Text
Zhou et al. "IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I16.17718Markdown
[Zhou et al. "IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/zhou2021aaai-isobn/) doi:10.1609/AAAI.V35I16.17718BibTeX
@inproceedings{zhou2021aaai-isobn,
title = {{IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization}},
author = {Zhou, Wenxuan and Lin, Bill Yuchen and Ren, Xiang},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2021},
pages = {14621-14629},
doi = {10.1609/AAAI.V35I16.17718},
url = {https://mlanthology.org/aaai/2021/zhou2021aaai-isobn/}
}