HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation
Abstract
We present a novel paradigm for statistical machine translation (SMT), based on joint modeling of word alignment and the topical aspects underlying bilingual document pairs via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this new paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of matching words between languages, during likelihood-based training of topic-dependent translational lexicons, as well as topic representations in each language. The resulting trained HM-BiTAM can not only display topic patterns like other methods such as LDA, but now for bilingual corpora; it also offers a principled way of inferring optimal translation in a context-dependent way. Our method integrates the conventional IBM Models based on HMM --- a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model, and we report an extensive empirical analysis (in many way complementary to the description-oriented of our method in three aspects: word alignment, bilingual topic representation, and translation.
Cite
Text
Zhao and Xing. "HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation." Neural Information Processing Systems, 2007.Markdown
[Zhao and Xing. "HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation." Neural Information Processing Systems, 2007.](https://mlanthology.org/neurips/2007/zhao2007neurips-hmbitam/)BibTeX
@inproceedings{zhao2007neurips-hmbitam,
title = {{HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation}},
author = {Zhao, Bing and Xing, Eric P.},
booktitle = {Neural Information Processing Systems},
year = {2007},
pages = {1689-1696},
url = {https://mlanthology.org/neurips/2007/zhao2007neurips-hmbitam/}
}