Learning Rules to Improve a Machine Translation System

Abstract

In this paper we show how to learn rules to improve the performance of a machine translation system. Given a system consisting of two translation functions (one from language A to language B and one from B to A), training text is translated from A to B and back again to A. Using these two translations, differences in knowledge between the two translation functions are identified, and rules are learned to improve the functions. Context-independent rules are learned where the information suggests only a single possible translation for a word. When there are multiple alternate translations for a word, a likelihood ratio test is used to identify words that co-occur with each case significantly. These words are then used as context in context-dependent rules. Applied on the Pan American Health Organization corpus of 20,084 sentences, the learned rules improve the understandability of the translation produced by the SDL International engine on 78% of sentences, with high precision.

Cite

Text

Kauchak and Elkan. "Learning Rules to Improve a Machine Translation System." European Conference on Machine Learning, 2003. doi:10.1007/978-3-540-39857-8_20

Markdown

[Kauchak and Elkan. "Learning Rules to Improve a Machine Translation System." European Conference on Machine Learning, 2003.](https://mlanthology.org/ecmlpkdd/2003/kauchak2003ecml-learning/) doi:10.1007/978-3-540-39857-8_20

BibTeX

@inproceedings{kauchak2003ecml-learning,
  title     = {{Learning Rules to Improve a Machine Translation System}},
  author    = {Kauchak, David and Elkan, Charles},
  booktitle = {European Conference on Machine Learning},
  year      = {2003},
  pages     = {205-216},
  doi       = {10.1007/978-3-540-39857-8_20},
  url       = {https://mlanthology.org/ecmlpkdd/2003/kauchak2003ecml-learning/}
}