Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model
Abstract
Techniques such as probabilistic topic models and latent-semantic indexing have been shown to be broadly useful at automatically extracting the topical or seman- tic content of documents, or more generally for dimension-reduction of sparse count data. These types of models and algorithms can be viewed as generating an abstraction from the words in a document to a lower-dimensional latent variable representation that captures what the document is generally about beyond the spe- cific words it contains. In this paper we propose a new probabilistic model that tempers this approach by representing each document as a combination of (a) a background distribution over common words, (b) a mixture distribution over gen- eral topics, and (c) a distribution over words that are treated as being specific to that document. We illustrate how this model can be used for information retrieval by matching documents both at a general topic level and at a specific word level, providing an advantage over techniques that only match documents at a general level (such as topic models or latent-sematic indexing) or that only match docu- ments at the specific word level (such as TF-IDF).
Cite
Text
Chemudugunta et al. "Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model." Neural Information Processing Systems, 2006.Markdown
[Chemudugunta et al. "Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/chemudugunta2006neurips-modeling/)BibTeX
@inproceedings{chemudugunta2006neurips-modeling,
title = {{Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model}},
author = {Chemudugunta, Chaitanya and Smyth, Padhraic and Steyvers, Mark},
booktitle = {Neural Information Processing Systems},
year = {2006},
pages = {241-248},
url = {https://mlanthology.org/neurips/2006/chemudugunta2006neurips-modeling/}
}