InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

Abstract

Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. However, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM). Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method. This works as a regularization to properly align topics and prevent degenerate topic representations of words, which mitigates the repetitive topic issue. To address the low-coverage dictionary issue, we further propose a cross-lingual vocabulary linking method that finds more linked cross-lingual words for topic alignment beyond the translations of a given dictionary. Extensive experiments on English, Chinese, and Japanese datasets demonstrate that our method outperforms state-of-the-art baselines, producing more coherent, diverse, and well-aligned topics and showing better transferability for cross-lingual classification tasks.

Cite

Text

Wu et al. "InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I11.26612

Markdown

[Wu et al. "InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/wu2023aaai-infoctm/) doi:10.1609/AAAI.V37I11.26612

BibTeX

@inproceedings{wu2023aaai-infoctm,
  title     = {{InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling}},
  author    = {Wu, Xiaobao and Dong, Xinshuai and Nguyen, Thong and Liu, Chaoqun and Pan, Liangming and Luu, Anh Tuan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {13763-13771},
  doi       = {10.1609/AAAI.V37I11.26612},
  url       = {https://mlanthology.org/aaai/2023/wu2023aaai-infoctm/}
}