An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism

Abstract

This paper addresses the limitations of subword based models in NLP by aligning the word embedding layer of a vocabulary-rigid transformer model to a vocabulary-free one. In order to do so, a CNN is trained to mimic the word embeddings layer of a BERT model, using a sequence of byte tokens as input. The study compares cosine-based and Euclidean-based loss functions for training the student network and finds better results with cosine-based metrics. The research contributes techniques for re-training transformer embedding layers and provides insights into loss function selection. The findings have implications for developing flexible and robust NLP models.

Cite

Text

Perez et al. "An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism." ICML 2023 Workshops: LXAI_Regular_Deadline, 2023.

Markdown

[Perez et al. "An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism." ICML 2023 Workshops: LXAI_Regular_Deadline, 2023.](https://mlanthology.org/icmlw/2023/perez2023icmlw-empirical/)

BibTeX

@inproceedings{perez2023icmlw-empirical,
  title     = {{An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism}},
  author    = {Perez, Alejandro Rodriguez and Sooksatra, Korn and Rivas, Pablo and Caballero, Ernesto Quevedo and Turek, Javier S. and Bichler, Gisela and Cerny, Tomas and Giddens, Laurie and Petter, Stacie},
  booktitle = {ICML 2023 Workshops: LXAI_Regular_Deadline},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/perez2023icmlw-empirical/}
}