Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

Abstract

While parallel corpora are an indispensable resource for data-driven multilingual natural language processing tasks such as machine translation, they are limited in quantity, quality and coverage. As a result, learning translation models from non-parallel corpora has become increasingly important nowadays, especially for low-resource languages. In this work, we propose a joint model for iteratively learning parallel lexicons and phrases from nonparallel corpora. The model is trained using a Viterbi EM algorithm that alternates between constructing parallel phrases using lexicons and updating lexicons based on the constructed parallel phrases. Experiments on Chinese-English datasets show that our approach learns better parallel lexicons and phrases and improves translation performance significantly.

Cite

Text

Dong et al. "Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora." International Joint Conference on Artificial Intelligence, 2015.

Markdown

[Dong et al. "Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora." International Joint Conference on Artificial Intelligence, 2015.](https://mlanthology.org/ijcai/2015/dong2015ijcai-iterative/)

BibTeX

@inproceedings{dong2015ijcai-iterative,
  title     = {{Iterative Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora}},
  author    = {Dong, Meiping and Liu, Yang and Luan, Huan-Bo and Sun, Maosong and Izuha, Tatsuya and Zhang, Dakun},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {1250-1256},
  url       = {https://mlanthology.org/ijcai/2015/dong2015ijcai-iterative/}
}