A Probabilistic Model for Text Kernels

Abstract

This paper explores several kernels in the context of text classification. A novel view of how documents might have been created is introduced and kernels are derived from this framework. The relations between these kernels as well as to the Gaussian kernel are discussed. Moreover, the popular tf-idf weighting scheme will be derived as a natural consequence. Finally, the kernels have been evaluated on the Reuters Corpus Volume I newswire database to assess their quality in a topic classification application.

Cite

Text

Lehmann and Shawe-Taylor. "A Probabilistic Model for Text Kernels." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143912

Markdown

[Lehmann and Shawe-Taylor. "A Probabilistic Model for Text Kernels." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/lehmann2006icml-probabilistic/) doi:10.1145/1143844.1143912

BibTeX

@inproceedings{lehmann2006icml-probabilistic,
  title     = {{A Probabilistic Model for Text Kernels}},
  author    = {Lehmann, Alain D. and Shawe-Taylor, John},
  booktitle = {International Conference on Machine Learning},
  year      = {2006},
  pages     = {537-544},
  doi       = {10.1145/1143844.1143912},
  url       = {https://mlanthology.org/icml/2006/lehmann2006icml-probabilistic/}
}