Learning to Probabilistically Identify Authoritative Documents
Abstract
We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumptions, our model provides probabilistic estimates that have clear semantics. We also find that in general, the identified authoritative documents correspond better to human intuition. 1. Introduction Bibliometrics has been described as a "series of techniques that seek to quantify the process of written communication" (Ikpaahindi, 1985). It typically attempts to give quantified answers to questions involving the relationships among documents, or authors and documents: "Who are the authoritative authors in this field?" "What are the seminal papers?" "How many distinct communities are studying this subject?" and others (see White & McCain, 1989, for details). Traditionally, the statistics...
Cite
Text
Cohn and Chang. "Learning to Probabilistically Identify Authoritative Documents." International Conference on Machine Learning, 2000.Markdown
[Cohn and Chang. "Learning to Probabilistically Identify Authoritative Documents." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/cohn2000icml-learning/)BibTeX
@inproceedings{cohn2000icml-learning,
title = {{Learning to Probabilistically Identify Authoritative Documents}},
author = {Cohn, David and Chang, Huan},
booktitle = {International Conference on Machine Learning},
year = {2000},
pages = {167-174},
url = {https://mlanthology.org/icml/2000/cohn2000icml-learning/}
}