Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Abstract
Selecting the right word translation among several options in the lexicon is a core problem for machine translation. We present a novel approach to this problem that can be trained using only unrelated monolingual corpora and a lexicon. By estimating word translation probabilities using the EM algorithm, we extend upon target language modeling. We construct a word translation model for 3830 German and 6147 English noun tokens, with very promising results. 1. Introduction Selecting the right word translation among several options in the lexicon is a core problem for machine translation. The problem is related to word sense disambiguation, which tries to determine the correct sense for a word occurrence (e.g. river bank vs. money bank). While the definition of word sense is a tricky issue, the picture is much clearer in translation. If we observe human translators, we can collect up the different ways in which a German word is usually translated into English. In some contexts, ...
Cite
Text
Koehn and Knight. "Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm." AAAI Conference on Artificial Intelligence, 2000.Markdown
[Koehn and Knight. "Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm." AAAI Conference on Artificial Intelligence, 2000.](https://mlanthology.org/aaai/2000/koehn2000aaai-estimating/)BibTeX
@inproceedings{koehn2000aaai-estimating,
title = {{Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm}},
author = {Koehn, Philipp and Knight, Kevin},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2000},
pages = {711-715},
url = {https://mlanthology.org/aaai/2000/koehn2000aaai-estimating/}
}