A Variational Autoencoding Approach for Inducing Cross-Lingual Word Embeddings
Abstract
Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word embeddings. The variational model introduces a continuous latent variable to explicitly model the underlying semantics of the parallel sentence pairs and to guide the generation of the sentence pairs. Our model restricts the bilingual word embeddings to represent words in exactly the same continuous vector space. Empirical results on the task of cross lingual document classification has shown that our method is effective.
Cite
Text
Wei and Deng. "A Variational Autoencoding Approach for Inducing Cross-Lingual Word Embeddings." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/582Markdown
[Wei and Deng. "A Variational Autoencoding Approach for Inducing Cross-Lingual Word Embeddings." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/wei2017ijcai-variational/) doi:10.24963/IJCAI.2017/582BibTeX
@inproceedings{wei2017ijcai-variational,
title = {{A Variational Autoencoding Approach for Inducing Cross-Lingual Word Embeddings}},
author = {Wei, Liang-Chen and Deng, Zhi-Hong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2017},
pages = {4165-4171},
doi = {10.24963/IJCAI.2017/582},
url = {https://mlanthology.org/ijcai/2017/wei2017ijcai-variational/}
}