Multilingual Topic Models for Unaligned Text
Abstract
We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora.
Cite
Text
Boyd-Graber and Blei. "Multilingual Topic Models for Unaligned Text." Conference on Uncertainty in Artificial Intelligence, 2009.Markdown
[Boyd-Graber and Blei. "Multilingual Topic Models for Unaligned Text." Conference on Uncertainty in Artificial Intelligence, 2009.](https://mlanthology.org/uai/2009/boydgraber2009uai-multilingual/)BibTeX
@inproceedings{boydgraber2009uai-multilingual,
title = {{Multilingual Topic Models for Unaligned Text}},
author = {Boyd-Graber, Jordan L. and Blei, David M.},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2009},
pages = {75-82},
url = {https://mlanthology.org/uai/2009/boydgraber2009uai-multilingual/}
}