Normalization of Language Embeddings for Cross-Lingual Alignment

Abstract

Learning a good transfer function to map the word vectors from two languages into a shared cross-lingual word vector space plays a crucial role in cross-lingual NLP. It is useful in translation tasks and important in allowing complex models built on a high-resource language like English to be directly applied on an aligned low resource language. While Procrustes and other techniques can align language models with some success, it has recently been identified that structural differences (for instance, due to differing word frequency) create different profiles for various monolingual embedding. When these profiles differ across languages, it correlates with how well languages can align and their performance on cross-lingual downstream tasks. In this work, we develop a very general language embedding normalization procedure, building and subsuming various previous approaches, which removes these structural profiles across languages without destroying their intrinsic meaning. We demonstrate that meaning is retained and alignment is improved on similarity, translation, and cross-language classification tasks. Our proposed normalization clearly outperforms all prior approaches like centering and vector normalization on each task and with each alignment approach.

Cite

Text

Aboagye et al. "Normalization of Language Embeddings for Cross-Lingual Alignment." International Conference on Learning Representations, 2022.

Markdown

[Aboagye et al. "Normalization of Language Embeddings for Cross-Lingual Alignment." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/aboagye2022iclr-normalization/)

BibTeX

@inproceedings{aboagye2022iclr-normalization,
  title     = {{Normalization of Language Embeddings for Cross-Lingual Alignment}},
  author    = {Aboagye, Prince Osei and Zheng, Yan and Yeh, Chin-Chia Michael and Wang, Junpeng and Zhang, Wei and Wang, Liang and Yang, Hao and Phillips, Jeff},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/aboagye2022iclr-normalization/}
}