Short and Sparse Text Topic Modeling via Self-Aggregation
Abstract
The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genres of short texts, the success has shed light on how to develop a generalized solution. In this paper, we present a novel model towards this goal by integrating topic modeling with short text aggregation during topic inference. The aggregation is founded on general topical affinity of texts rather than particular heuristics, making the model readily applicable to various short texts. Experimental results on real-world datasets validate the effectiveness of this new model, suggesting that it can distill more meaningful topics from short texts.
Cite
Text
Quan et al. "Short and Sparse Text Topic Modeling via Self-Aggregation." International Joint Conference on Artificial Intelligence, 2015.Markdown
[Quan et al. "Short and Sparse Text Topic Modeling via Self-Aggregation." International Joint Conference on Artificial Intelligence, 2015.](https://mlanthology.org/ijcai/2015/quan2015ijcai-short/)BibTeX
@inproceedings{quan2015ijcai-short,
title = {{Short and Sparse Text Topic Modeling via Self-Aggregation}},
author = {Quan, Xiaojun and Kit, Chunyu and Ge, Yong and Pan, Sinno Jialin},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2015},
pages = {2270-2276},
url = {https://mlanthology.org/ijcai/2015/quan2015ijcai-short/}
}