Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations

Abstract

Using a dictionary to map independently trained word embeddings to a shared space has shown to be an effective approach to learn bilingual word embeddings. In this work, we propose a multi-step framework of linear transformations that generalizes a substantial body of previous work. The core step of the framework is an orthogonal transformation, and existing methods can be explained in terms of the additional normalization, whitening, re-weighting, de-whitening and dimensionality reduction steps. This allows us to gain new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction. The corresponding software is released as an open source project.

Cite

Text

Artetxe et al. "Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11992

Markdown

[Artetxe et al. "Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/artetxe2018aaai-generalizing/) doi:10.1609/AAAI.V32I1.11992

BibTeX

@inproceedings{artetxe2018aaai-generalizing,
  title     = {{Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations}},
  author    = {Artetxe, Mikel and Labaka, Gorka and Agirre, Eneko},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {5012-5019},
  doi       = {10.1609/AAAI.V32I1.11992},
  url       = {https://mlanthology.org/aaai/2018/artetxe2018aaai-generalizing/}
}