Robust Multi-View Topic Modeling by Incorporating Detecting Anomalies

Abstract

Multi-view text data consist of texts from different sources. For instance, multilingual Wikipedia corpora contain articles in different languages which are created by different group of users. Because multi-view text data are often created in distributed fashion, information from different sources may not be consistent. Such inconsistency introduce noise to analysis of such kind of data. In this paper, we propose a probabilistic topic model for multi-view data, which is robust against noise. The proposed model can also be used for detecting anomalies. In our experiments on Wikipedia data sets, the proposed model is more robust than existing multi-view topic models in terms of held-out perplexity.

Cite

Text

Zhang et al. "Robust Multi-View Topic Modeling by Incorporating Detecting Anomalies." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017. doi:10.1007/978-3-319-71246-8_15

Markdown

[Zhang et al. "Robust Multi-View Topic Modeling by Incorporating Detecting Anomalies." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017.](https://mlanthology.org/ecmlpkdd/2017/zhang2017ecmlpkdd-robust/) doi:10.1007/978-3-319-71246-8_15

BibTeX

@inproceedings{zhang2017ecmlpkdd-robust,
  title     = {{Robust Multi-View Topic Modeling by Incorporating Detecting Anomalies}},
  author    = {Zhang, Guoxi and Iwata, Tomoharu and Kashima, Hisashi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2017},
  pages     = {238-250},
  doi       = {10.1007/978-3-319-71246-8_15},
  url       = {https://mlanthology.org/ecmlpkdd/2017/zhang2017ecmlpkdd-robust/}
}