Mismatch String Kernels for SVM Protein Classification

Abstract

We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence sim- ilarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the kernels efficiently using a mismatch tree data structure and report experiments on a benchmark SCOP dataset, where we show that the mismatch kernel used with an SVM classifier performs as well as the Fisher kernel, the most success- ful method for remote homology detection, while achieving considerable computational savings.

Cite

Text

Eskin et al. "Mismatch String Kernels for SVM Protein Classification." Neural Information Processing Systems, 2002.

Markdown

[Eskin et al. "Mismatch String Kernels for SVM Protein Classification." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/eskin2002neurips-mismatch/)

BibTeX

@inproceedings{eskin2002neurips-mismatch,
  title     = {{Mismatch String Kernels for SVM Protein Classification}},
  author    = {Eskin, Eleazar and Weston, Jason and Noble, William S. and Leslie, Christina S.},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {1441-1448},
  url       = {https://mlanthology.org/neurips/2002/eskin2002neurips-mismatch/}
}