Learning Rules to Improve a Machine Translation System
Abstract
In this paper we show how to learn rules to improve the performance of a machine translation system. Given a system consisting of two translation functions (one from language A to language B and one from B to A), training text is translated from A to B and back again to A. Using these two translations, differences in knowledge between the two translation functions are identified, and rules are learned to improve the functions. Context-independent rules are learned where the information suggests only a single possible translation for a word. When there are multiple alternate translations for a word, a likelihood ratio test is used to identify words that co-occur with each case significantly. These words are then used as context in context-dependent rules. Applied on the Pan American Health Organization corpus of 20,084 sentences, the learned rules improve the understandability of the translation produced by the SDL International engine on 78% of sentences, with high precision.
Cite
Text
Kauchak and Elkan. "Learning Rules to Improve a Machine Translation System." European Conference on Machine Learning, 2003. doi:10.1007/978-3-540-39857-8_20Markdown
[Kauchak and Elkan. "Learning Rules to Improve a Machine Translation System." European Conference on Machine Learning, 2003.](https://mlanthology.org/ecmlpkdd/2003/kauchak2003ecml-learning/) doi:10.1007/978-3-540-39857-8_20BibTeX
@inproceedings{kauchak2003ecml-learning,
title = {{Learning Rules to Improve a Machine Translation System}},
author = {Kauchak, David and Elkan, Charles},
booktitle = {European Conference on Machine Learning},
year = {2003},
pages = {205-216},
doi = {10.1007/978-3-540-39857-8_20},
url = {https://mlanthology.org/ecmlpkdd/2003/kauchak2003ecml-learning/}
}