The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
Abstract
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as authoritative documents within those topics. Furthermore, the relationships between topics is mapped out in order to build a predictive model of link content. Among the many applications of this approach are information retrieval and search, topic identification, query disambigua(cid:173) tion, focused web crawling, web authoring, and bibliometric analysis.
Cite
Text
Cohn and Hofmann. "The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity." Neural Information Processing Systems, 2000.Markdown
[Cohn and Hofmann. "The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity." Neural Information Processing Systems, 2000.](https://mlanthology.org/neurips/2000/cohn2000neurips-missing/)BibTeX
@inproceedings{cohn2000neurips-missing,
title = {{The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity}},
author = {Cohn, David A. and Hofmann, Thomas},
booktitle = {Neural Information Processing Systems},
year = {2000},
pages = {430-436},
url = {https://mlanthology.org/neurips/2000/cohn2000neurips-missing/}
}