A Probabilistic Model for Text Kernels
Abstract
This paper explores several kernels in the context of text classification. A novel view of how documents might have been created is introduced and kernels are derived from this framework. The relations between these kernels as well as to the Gaussian kernel are discussed. Moreover, the popular tf-idf weighting scheme will be derived as a natural consequence. Finally, the kernels have been evaluated on the Reuters Corpus Volume I newswire database to assess their quality in a topic classification application.
Cite
Text
Lehmann and Shawe-Taylor. "A Probabilistic Model for Text Kernels." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143912Markdown
[Lehmann and Shawe-Taylor. "A Probabilistic Model for Text Kernels." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/lehmann2006icml-probabilistic/) doi:10.1145/1143844.1143912BibTeX
@inproceedings{lehmann2006icml-probabilistic,
title = {{A Probabilistic Model for Text Kernels}},
author = {Lehmann, Alain D. and Shawe-Taylor, John},
booktitle = {International Conference on Machine Learning},
year = {2006},
pages = {537-544},
doi = {10.1145/1143844.1143912},
url = {https://mlanthology.org/icml/2006/lehmann2006icml-probabilistic/}
}