HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation

Abstract

We present a novel paradigm for statistical machine translation (SMT), based on joint modeling of word alignment and the topical aspects underlying bilingual document pairs via a hidden Markov Bilingual Topic AdMixture (HM-BiTAM). In this new paradigm, parallel sentence-pairs from a parallel document-pair are coupled via a certain semantic-flow, to ensure coherence of topical context in the alignment of matching words between languages, during likelihood-based training of topic-dependent translational lexicons, as well as topic representations in each language. The resulting trained HM-BiTAM can not only display topic patterns like other methods such as LDA, but now for bilingual corpora; it also offers a principled way of inferring optimal translation in a context-dependent way. Our method integrates the conventional IBM Models based on HMM --- a key component for most of the state-of-the-art SMT systems, with the recently proposed BiTAM model, and we report an extensive empirical analysis (in many way complementary to the description-oriented of our method in three aspects: word alignment, bilingual topic representation, and translation.

Cite

Text

Zhao and Xing. "HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation." Neural Information Processing Systems, 2007.

Markdown

[Zhao and Xing. "HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation." Neural Information Processing Systems, 2007.](https://mlanthology.org/neurips/2007/zhao2007neurips-hmbitam/)

BibTeX

@inproceedings{zhao2007neurips-hmbitam,
  title     = {{HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation}},
  author    = {Zhao, Bing and Xing, Eric P.},
  booktitle = {Neural Information Processing Systems},
  year      = {2007},
  pages     = {1689-1696},
  url       = {https://mlanthology.org/neurips/2007/zhao2007neurips-hmbitam/}
}