A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
Abstract
A probabilistic analysis of the Rocchio relevance feedback algorithm, one of the most popular learning methods from information retrieval, is presented in a text categorization framework. The analysis results in a probabilistic version of the Rocchio classifier and offers an explanation for the TFIDF word weighting heuristic. The Rocchio classifier, its probabilistic variant and a standard naive Bayes classifier are compared on three text categorization tasks. The results suggest that the probabilistic algorithms are preferable to the heuristic Rocchio classifier. This research is sponsored by the Wright Laboratory, Aeronautical Systems Center, Air Force Materiel Command, USAF, and the Advanced Research Projects Agency (ARPA) under grant F33615-93-1-1330. The US Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyright notation thereon. Views and conclusions contained in this document are those of the authors and should not be ...
Cite
Text
Joachims. "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization." International Conference on Machine Learning, 1997.Markdown
[Joachims. "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization." International Conference on Machine Learning, 1997.](https://mlanthology.org/icml/1997/joachims1997icml-probabilistic/)BibTeX
@inproceedings{joachims1997icml-probabilistic,
title = {{A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization}},
author = {Joachims, Thorsten},
booktitle = {International Conference on Machine Learning},
year = {1997},
pages = {143-151},
url = {https://mlanthology.org/icml/1997/joachims1997icml-probabilistic/}
}